[论文解读] MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation
MA-SAM 将 Segment Anything Model (SAM) 调整为用于三维医疗数据的参数高效微调与3D 适配器,在 CT、MRI 和手术视频上实现无需提示的强大自动分割。
The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. The effectiveness of our method has been comprehensively evaluated on four medical image segmentation tasks, by using 10 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.
研究动机与目标
- 将 SAM 作为医学影像的通用分割基础模型,以应对自然图像与医学图像之间的领域差异。
- 构建一个模态无关、参数高效的微调框架,以整合 3D 体积/时序信息。
- 实现对 CT、MRI 和手术视频模态的无需提示的有效自动分割,并具备强鲁棒性。
- 提高掩模解码器的分割分辨率,以更好地处理小型医学结构。
提出的方法
- 使用 FacT 基于参数高效微调来更新图像编码器中的低秩权重增量(跨层共享的 U/V 因子,逐层 Sigma)。
- 在每个 transformer 模块中注入 3D 适配器,通过 Conv3D(3x1x1 内核)提取体积/时序信息,应用于重新排列以实现与相邻切片输入的兼容性。
- 对掩模解码器进行全量微调并进行轻微修改,采用渐进式上采样以恢复原始分辨率并改善对小结构的分割。
- 将输入重塑以使 3D 上下文与 2D SAM 主干对齐(将相邻切片连接为批次维度;重新整理特征图以适配 3D卷积)。
- 使用混合分割损失(交叉熵 + Dice)和数据增强进行训练;在 ViT-H 主干上进行大批量微调,共400 轮。

实验结果
研究问题
- RQ1SAM 是否能够通过模态无关、轻量级微调的方法有效适配到三维医学数据?
- RQ2在 2D SAM 主干中注入 3D 适配器是否能够提升对体积/时序信息的利用以改进医学分割?
- RQ3对掩模解码器进行全量微调是否对医学分割有益,以及渐进式上采样策略是否提升分辨率?
- RQ4MA-SAM 在无需提示的情况下对 CT、MRI 与手术视频数据集的泛化程度有多大?
- RQ5使用提示(如 3D 边界框)是否进一步提升 MA-SAM 在具有挑战性的肿瘤分割任务中的表现?
主要发现
| Table 1: BTCV abdominal multi-organ Dice results (Dice [%]) and Average/HD (HD [%]) across methods | Dice columns: Spleen, Right Kidney, Left Kidney, Gall bladder, Esophagus, Liver, Stomach, Aorta, IVC, Veins, Pancreas, Adrenal Gland, Average Dice | HD columns: Spleen, Right Kidney, Left Kidney, Gall bladder, Esophagus, Liver, Stomach, Aorta, IVC, Veins, Pancreas, Adrenal Gland, Average HD | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nnU-Net | 97.0 | 95.3 | 95.3 | 63.5 | 77.5 | 97.4 | 89.1 | 90.1 | 88.5 | 79.0 | 87.1 | 75.2 | 86.3 | 1.07 | 1.19 | 1.19 | 7.49 | 8.56 | 1.14 | 4.84 | 14.11 | 2.87 | 5.67 | 2.31 | 2.23 | 4.39 | 4.39 |
| 3D UX-Net | 94.6 | 94.2 | 94.3 | 59.3 | 72.2 | 96.4 | 73.4 | 87.2 | 84.9 | 72.2 | 80.9 | 67.1 | 81.4 | 3.17 | 1.59 | 1.26 | 4.53 | 13.92 | 1.75 | 19.72 | 12.53 | 3.47 | 9.99 | 3.70 | 4.11 | 6.68 | |
| SwinUNETR | 95.6 | 94.2 | 94.3 | 63.6 | 75.5 | 96.6 | 79.2 | 89.9 | 83.7 | 75.0 | 82.2 | 67.3 | 83.1 | 1.21 | 1.41 | 1.37 | 2.25 | 5.82 | 1.70 | 13.75 | 5.92 | 4.46 | 7.58 | 3.53 | 3.40 | 4.37 | |
| nnFormer | 93.5 | 94.9 | 95.0 | 64.1 | 79.5 | 96.8 | 90.1 | 89.7 | 85.9 | 77.8 | 85.6 | 73.9 | 85.6 | 78.03 | 1.41 | 1.43 | 3.00 | 4.92 | 1.38 | 4.24 | 7.53 | 4.02 | 6.53 | 2.96 | 2.76 | 9.95 | |
| SAMed_h | 95.3 | 92.1 | 92.9 | 62.1 | 75.3 | 96.4 | 90.2 | 87.6 | 79.8 | 74.2 | 77.9 | 61.0 | 82.1 | 1.37 | 33.53 | 1.84 | 6.27 | 4.84 | 1.77 | 7.49 | 4.97 | 7.28 | 6.87 | 10.00 | 6.49 | 7.73 | 7.73 |
| MA-SAM (Ours) | 96.7 | 95.1 | 95.4 | 68.2 | 82.1 | 96.9 | 92.8 | 91.1 | 87.5 | 79.8 | 86.6 | 73.9 | 87.2 | 1.00 | 1.19 | 1.07 | 1.59 | 3.77 | 1.36 | 3.87 | 5.29 | 3.12 | 3.25 | 3.93 | 2.57 | 2.67 | 2.67 |
- MA-SAM 在四个任务(CT、MRI、手术视频)上无需提示的情况下持续超越最先前的3D医学分割方法的状态-of-the-art。
- 在 BTCV 腹部多器官分割上,MA-SAM 达到 Dice 87.2%、平均 87.2、HD 平均 2.67%,优于 nnU-Net 与其他基线。
- 在 6 地点的前列腺 MRI 数据集上,MA-SAM 达到平均 Dice 92.6%、HD 1.94%,超过 nnU-Net 与 SAMed_h 基线。
- 在 EndoVis18 手术场景分割中,MA-SAM 达到 mIoU 69.2、Dice 77.0,超过面向任务的和基于 SAM 的方法。
- 对于胰腺肿瘤分割(MSD-Pancreas),自动 MA-SAM 达到 Dice 40.2、NSD 59.1,在使用 3D 边界框提示时(紧凑的 3D 框)分数提升显著(Dice 高达 80.35%)。
- 在有提示时,MA-SAM 在具有挑战性的肿瘤分割场景中,Dice 最高可超越 nnU-Net 38.7%。
- MA-SAM 在 AMOS22 CT/MRI 数据上的零-shot 与少样本泛化能力强,优于 nnU-Net 与 SOTA 跨域泛化方法。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。