QUICK REVIEW

[论文解读] Innovative Tooth Segmentation Using Hierarchical Features and Bidirectional Sequence Modeling

Xinxin Zhao, Jian Jiang|arXiv (Cornell University)|Feb 25, 2026

Dental Radiography and Imaging被引用 0

一句话总结

本文提出基于 SAM 的牙科图像分割框架，结合三阶段层次编码器和双向序列块以提升分割质量与效率，在牙科数据集上实现显著的 mIoU 提升。

ABSTRACT

Tooth image segmentation is a cornerstone of dental digitization. However, traditional image encoders relying on fixed-resolution feature maps often lead to discontinuous segmentation and poor discrimination between target regions and background, due to insufficient modeling of environmental and global context. Moreover, transformer-based self-attention introduces substantial computational overhead because of its quadratic complexity (O(n^2)), making it inefficient for high-resolution dental images. To address these challenges, we introduce a three-stage encoder with hierarchical feature representation to capture scale-adaptive information in dental images. By jointly leveraging low-level details and high-level semantics through cross-scale feature fusion, the model effectively preserves fine structural information while maintaining strong contextual awareness. Furthermore, a bidirectional sequence modeling strategy is incorporated to enhance global spatial context understanding without incurring high computational cost. We validate our method on two dental datasets, with experimental results demonstrating its superiority over existing approaches. On the OralVision dataset, our model achieves a 1.1% improvement in mean intersection over union (mIoU).

研究动机与目标

以高效、可扩展的模型推动高质量牙科图像分割，处理多尺度结构与全局上下文。
开发具层次特征的任务特定图像编码器，在保持上下文感知的同时保留细小结构。
引入双向序列块以线性复杂度捕捉全局空间上下文。
整合多尺度特征金字塔与提示词驱动解码，以生成精准的牙科分割掩膜。

提出的方法

引入三阶段下采样编码器以构建牙科图像的多尺度特征。
使用状态空间模型替代二次自注意力的双向序列块（BSB），实现前向与后向上下文聚合。
在解码器中通过自上而下的融合融合层级特征，形成三级特征金字塔以引导掩膜生成。
采用提示编码器与基于 SAM 的解码器，使分割掩膜以点/框提示为条件进行输出。
通过交叉熵与 Dice 损失的组合进行训练，并结合类别权重与测试时增强以提升鲁棒性。

实验结果

研究问题

RQ1相较于现有基于 SAM 的方法，分层多尺度特征和双向序列块是否能提升牙科图像分割质量与边界准确性？
RQ2所提出的编码器与 BSB 是否在高分辨率牙科图像上实现效率提升（延迟降低），同时保持或提高 mIoU 与边界 IoU？
RQ3多尺度特征融合在存在噪声的复杂口腔环境中对分割性能的影响如何？

主要发现

Variant	mIoU	mBIoU
Bidirectional SSM (DSD ablation)	90.7	87.2
Bidirectional SSM + Conv1d (DSD ablation)	90.9	87.9
No Gate (DSD ablation)	90.8	87.2
Shared Gate (DSD ablation)	91.4	87.9
Dual Gate (Ours)	91.9	88.7

在 OralVision 上，所提出的方法相对于基线在平均 IoU（mIoU）上提升了 1.1 个百分点。
在消融实验中，双向序列块提高了 mIoU：None 89.1% → Bidirectional SSM 90.7% → Bidirectional SSM + Conv1d 90.9%（DSD）。
进一步消融显示门控设计的影响：No Gate 90.8 mIoU / 87.2 mBIoU；Shared Gate 91.4 mIoU / 87.9 mIoU；Dual Gate（本文方法）91.9 mIoU / 88.7 mIoU。
在较高图像分辨率下，该方法的延迟保持低于可比方法，验证了在保持分割质量的同时的效率性。
在 DSD 与 OralVision 数据集上的实验表明，分割掩膜更优、边界在嘈杂牙科图像中更鲁棒地 delineated。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。