Skip to main content
QUICK REVIEW

[论文解读] AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation

Chengyin Li, Prashant Khanduri|arXiv (Cornell University)|Aug 28, 2023
Advanced Neural Network Applications被引用 11
一句话总结

本论文将 2D Segment Anything Model (SAM) 适配用于基于 3D CT 的多器官医学影像分割,使用参数高效的适配器、自动提示生成器以及知识蒸馏到轻量级模型,在多个数据集上达到最先进的结果。

ABSTRACT

Segment Anything Model (SAM) is one of the pioneering prompt-based foundation models for image segmentation and has been rapidly adopted for various medical imaging applications. However, in clinical settings, creating effective prompts is notably challenging and time-consuming, requiring the expertise of domain specialists such as physicians. This requirement significantly diminishes SAM's primary advantage, its interactive capability with end users, in medical applications. Moreover, recent studies have indicated that SAM, originally designed for 2D natural images, performs suboptimally on 3D medical image segmentation tasks. This subpar performance is attributed to the domain gaps between natural and medical images and the disparities in spatial arrangements between 2D and 3D images, particularly in multi-organ segmentation applications. To overcome these challenges, we present a novel technique termed AutoProSAM. This method automates 3D multi-organ CT-based segmentation by leveraging SAM's foundational model capabilities without relying on domain experts for prompts. The approach utilizes parameter-efficient adaptation techniques to adapt SAM for 3D medical imagery and incorporates an effective automatic prompt learning paradigm specific to this domain. By eliminating the need for manual prompts, it enhances SAM's capabilities for 3D medical image segmentation and achieves state-of-the-art (SOTA) performance in CT-based multi-organ segmentation tasks. The code is in this {\href{https://github.com/ChengyinLee/AutoProSAM_2024}{link}}.

研究动机与目标

  • 弥合二维 SAM 与三维医学影像在多器官 CT 分割中的差距。
  • 通过引入自动提示生成模块,消除手动提示。
  • 通过参数高效的自适应和轻量解码实现高分割准确性。
  • 证明将学到的知识迁移到更小的模型以适用于设备端/POCT 场景。

提出的方法

  • 通过带有 3D 感知位置编码的轻量化适配器修改 2D SAM 图像编码器,以在复用预训练权重的同时实现 3D 处理。
  • 引入一个 Auto Prompt Generator (APG),通过一个轻量级的3D UNet 风格的编码器-解码器从特征图学习提示,去除手动提示。
  • 用一个3D、轻量级解码器替换2D掩码解码器,采用多层聚合(MLAM)和跳跃连接以更好地保留细节。
  • 应用知识蒸馏(KD)框架将 AutoSAM Adapter 的知识迁移到更小的模型(如 SwinUNETR tiny/small)以适应资源受限的部署。
  • 使用 Dice 损失和交叉熵损失的组合进行训练;使用 KD 损失(教师与学生之间的均方误差)并配有可调 λ,以在真实标签和教师之间实现平衡学习。
  • 采用两阶段训练策略:在训练适配器、提示与解码器参数时冻结大部分预训练组件。
Figure 1 : Challenges associated with using SAM for medical image segmentation include: (A) a T-SNE plot of embeddings encoded by SAM’s image encoder, showcasing differences between medical image datasets such as AMOS [ 15 ] and BTCV [ 19 ] , and natural image datasets like ADE20K [ 48 ] and COCO [
Figure 1 : Challenges associated with using SAM for medical image segmentation include: (A) a T-SNE plot of embeddings encoded by SAM’s image encoder, showcasing differences between medical image datasets such as AMOS [ 15 ] and BTCV [ 19 ] , and natural image datasets like ADE20K [ 48 ] and COCO [

实验结果

研究问题

  • RQ1是否可以在最小微调的条件下,将二维基础模型(SAM)有效地适配为三维医学影像分割?
  • RQ2相较于手动提示,自动提示生成模块是否能提升多器官三维分割的性能?
  • RQ3结合轻量级3D适配器、3D解码器和 MLAM 的方法是否能在CT数据上超越最先进的3D医学分割模型?
  • RQ4知识蒸馏是否能有效将 AutoSAM Adapter 的知识迁移到更小的、用于设备端的分割模型以适用于POCT场景?

主要发现

  • AutoSAM Adapter 在多个 CT 数据集上通常优于最先进的3D医学分割方法(Dice、NSD)。
  • 在 BTCV 上,与基线相比,Dice 提升最多 3%,NSD 提升 3–7%;AMOS 在数据量增加时显示出更大增益。
  • 在 CT-ORG 上,AutoSAM Adapter 实现最高的 NSD,并在基线中具有竞争力的 Dice 分数。
  • 与基于 SAM 的方法相比,提出的 P&M(提示与掩码)微调与 AutoPROMOTER 超越了 MedSAM 和其他基于 SAM 的适配器,在某些指标中甚至超过了完整的 MedSAM。
  • 知识蒸馏使将学到的三维能力迁移到轻量级的 SwinUNETR 变体成为可能,带来显著的 Dice 提升(例如在 BTCV 上约提升 4%)。
  • 消融研究表明 APG 和 MLAM 对实现高 Dice 和 NSD 至关重要;KD 参数 lambda 取约 0.5 时,在教师与真实标签之间实现平衡学习。
Figure 2 : (A) The overall architecture of the AutoSAM Adapter , (B) the design of the Spatial Adapter module, which utilizes parameter-efficient model fine-tuning, (C) the architecture of the Auto Prompt Generator , featuring a U-Net-like encoder-decoder design, and (D) the pipeline for deriving a
Figure 2 : (A) The overall architecture of the AutoSAM Adapter , (B) the design of the Spatial Adapter module, which utilizes parameter-efficient model fine-tuning, (C) the architecture of the Auto Prompt Generator , featuring a U-Net-like encoder-decoder design, and (D) the pipeline for deriving a

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。