QUICK REVIEW

[论文解读] DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

Yifan Gao, Wei Xia|arXiv (Cornell University)|Jun 1, 2023

Advanced Neural Network Applications被引用 19

一句话总结

DeSAM 将 SAM 的掩码解码器解耦为提示相关的 IoU 与提示不变掩码，使医学图像分割实现全自动、单源域泛化，并在跨站点前列腺分割中取得强劲结果。

ABSTRACT

Deep learning-based medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a prompt-driven foundation model with powerful generalization capabilities, the Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM performs significantly worse in automatic segmentation scenarios than when manually prompted, hindering its direct application to domain generalization. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of inevitable poor prompts and mask generation. To address the coupling effect, we propose the Decoupled SAM (DeSAM). DeSAM modifies SAM's mask decoder by introducing two new modules: a prompt-relevant IoU module (PRIM) and a prompt-decoupled mask module (PDMM). PRIM predicts the IoU score and generates mask embeddings, while PDMM extracts multi-scale features from the intermediate layers of the image encoder and fuses them with the mask embeddings from PRIM to generate the final segmentation mask. This decoupled design allows DeSAM to leverage the pre-trained weights while minimizing the performance degradation caused by poor prompts. We conducted experiments on publicly available cross-site prostate and cross-modality abdominal image segmentation datasets. The results show that our DeSAM leads to a substantial performance improvement over previous state-of-theart domain generalization methods. The code is publicly available at https://github.com/yifangao112/DeSAM.

研究动机与目标

在不需要多源数据或目标域数据的前提下，利用基础模型（SAM）解决医学图像分割中的域迁移问题。
消除图像嵌入与提示嵌入之间由提示驱动的耦合，以提升全自动分割。
通过冻结编码器并预计算图像嵌入，实现内存高效训练。
在多个临床站点的前列腺 MRI 数据集上展示改进的跨站点泛化。

提出的方法

将 SAM 的掩码解码器解耦为两个模块：提示相关 IoU 模块（PRIM）和提示不变掩码模块（PIMM）。
冻结图像和提示编码器；使用 SAM 图像编码器预计算图像嵌入以减少 GPU 内存使用。
PRIM 使用跨注意力 Transformer 生成掩码嵌入和 IoU 分数（无直接的掩码头）。
PIMM 通过类似 U-Net/UNETR 的编码-解码结构，将来自多尺度的图像嵌入与 PRIM 的输出融合，以生成最终掩码。
使用网格点提示进行训练（9x9 网格，网格点在真实掩码内外）或整框提示；损失包括掩码的 L_dice、L_ce 和 IoU 的 L_mse。
网格模式下的真值监督使用 L_points = L_dice + L_ce + L_mse，权重为（1，1，10）；框模式下使用 L_box = L_dice + L_ce。

实验结果

研究问题

RQ1将掩码解码器解耦能否抑制在全自动基于 SAM 的医学分割中提示不良的负面影响？
RQ2冻结图像编码器并预计算图像嵌入是否能够在入门级 GPU 上训练，同时实现强跨域泛化？
RQ3在跨站点前列腺分割任务中，DeSAM 与现有单源域泛化方法相比如何？
RQ4各组件（PRIM、PIMM、IoU 头、融合策略）对整体性能的贡献是什么？

主要发现

Method	A 到 Rest	B 到 Rest	C 到 Rest	D 到 Rest	E 到 Rest	F 到 Rest	Overall
Upper bound [53]	85.38	83.68	82.15	85.21	87.04	84.29	84.63
Baseline [53]	63.73	61.21	27.41	34.36	44.10	61.70	48.75
AdvNoise [51]	72.15	63.26	30.81	40.12	48.07	60.12	52.42
AdvBias [16]	77.45	62.12	51.09	70.20	51.12	50.69	60.45
RandConv [17]	75.52	57.23	44.21	61.27	49.98	54.21	57.07
MixStyle [52]	73.04	59.29	43.00	62.17	53.12	50.03	56.78
MaxStyle [7]	81.25	70.27	62.09	58.18	70.04	67.77	68.27
CSDG [18]	80.72	68.00	59.78	72.40	68.67	70.78	70.06
MedSAM [44]	72.32	73.31	61.53	64.46	68.89	61.39	66.98
DeSAM (whole box)	82.30	78.06	66.65	82.87	77.58	79.05	77.75
DeSAM (grid points)	82.80	80.61	64.77	83.41	80.36	82.17	79.02
Impro. over baseline	+19.07	+19.40	+37.36	+49.05	+36.26	+20.47	++30.27

与以往最先进的域泛化方法相比，DeSAM 在前列腺分割的跨站点 Dice 分数平均提升 8.96 个百分点（从 70.06% 提升至 79.02%）。
DeSAM（网格点）总体 Dice 为 79.02%，优于 DeSAM（整框）及之前的方法；DeSAM（整框）为 77.75%。
与 MedSAM 相比，DeSAM 降低了对不良提示的敏感性，在前列腺数据集上获得更高的平均 Dice（77.75% 对比 MedSAM 的 66.98%）。
消融分析显示：仅使用 PIMM 得到 73.85%；加入带 IoU 头的 PRIM 提升至 75.12%；加入掩码嵌入融合提升至 75.81%；完整的 DeSAM 达到 79.02% 总体。
将网格点提示从 1 点增加到 9 点可维持或提升性能，在 9 点时达到 79.02%，更多点数时保持稳定（例如 25 点达到 79.03%）。
在入门级 GPU（RTX 3060 12GB）上使用预计算图像嵌入进行训练的 DeSAM，可以显著降低内存需求，相较于对编码器进行微调的方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。