QUICK REVIEW

[論文レビュー] DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

Yifan Gao, Wei Xia|arXiv (Cornell University)|Jun 1, 2023

Advanced Neural Network Applications被引用数 19

ひとこと要約

DeSAMはSAMのマスクデコーダをプロンプト関連のIoUとプロンプト不変のマスキングに分離し、医用画像分割の単一ソースドメイン一般化を完全自動化で可能にし、複数サイトの前立腺分割で強力な結果を達成します。

ABSTRACT

Deep learning-based medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a prompt-driven foundation model with powerful generalization capabilities, the Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM performs significantly worse in automatic segmentation scenarios than when manually prompted, hindering its direct application to domain generalization. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of inevitable poor prompts and mask generation. To address the coupling effect, we propose the Decoupled SAM (DeSAM). DeSAM modifies SAM's mask decoder by introducing two new modules: a prompt-relevant IoU module (PRIM) and a prompt-decoupled mask module (PDMM). PRIM predicts the IoU score and generates mask embeddings, while PDMM extracts multi-scale features from the intermediate layers of the image encoder and fuses them with the mask embeddings from PRIM to generate the final segmentation mask. This decoupled design allows DeSAM to leverage the pre-trained weights while minimizing the performance degradation caused by poor prompts. We conducted experiments on publicly available cross-site prostate and cross-modality abdominal image segmentation datasets. The results show that our DeSAM leads to a substantial performance improvement over previous state-of-theart domain generalization methods. The code is publicly available at https://github.com/yifangao112/DeSAM.

研究の動機と目的

複数ソースデータやターゲットドメインデータを必要とせず、基盤モデル（SAM）を活用して医用画像分割におけるドメインシフトに対処する。
画像とプロンプト埋め込み間のプロンプト駆動結合を解消し、完全自動分割を向上させる。
エンコーダを凍結し画像埋め込みを事前計算することで、メモリ効率の良いトレーニングを可能にする。
複数の臨床施設に跨る前立腺MRIデータセットで、サイトを横断した一般化の改善を実証する。

提案手法

SAMのマスクデコーダを2つのモジュールに分離する：Prompt-Relevant IoU Module (PRIM) と Prompt-Invariant Mask Module (PIMM)。
画像エンコーダとプロンプトエンコーダを凍結し、SAMの画像エンコーダを用いて画像埋め込みを事前計算してGPUメモリ使用量を削減。
PRIMは交差アテンション型トランスフォーマを用いてマスク埋め込みとIoUスコアを生成する（直接のマスクヘッドはない）。
PIMMは複数スケールの画像埋め込みとPRIMの出力を、U-Net/UNETR様のエンコーダ-デコーダ構造を介して融合し、最終マスクを生成する。
グリッド点プロンプト（9x9グリッド、ground truth内外の点を含む）または全体ボックスプロンプトで訓練。ロスはマスクに対してL_dice、L_ce、IoUに対してL_mseを含む。
グリッドモードの真の監修は L_points = L_dice + L_ce + L_mse の重み (1,1,10) を用い；ボックスモードでは L_box = L_dice + L_ce を用いる。

実験結果

リサーチクエスチョン

RQ1マスクデコーダをデカップリングすることで、完全自動のSAMベース医用分割における不適切なプロンプトの悪影響を抑制できるか？
RQ2画像エンコーダを凍結し画像埋め込みを事前計算することで、エントリーレベルのGPUで訓練を行いながら強力なドメイン横断一般化を実現できるか？
RQ3DeSAMはクロスサイトの前立腺分割において、既存の単一ソースドメイン一般化手法とどう比較されるか？
RQ4各コンポーネント（PRIM、PIMM、IoUヘッド、融合戦略）の全体性能への寄与は何か？

主な発見

方法	A から Rest へ	B から Rest へ	C から Rest へ	D から Rest へ	E から Rest へ	F から Rest へ	全体
上界 [53]	85.38	83.68	82.15	85.21	87.04	84.29	84.63
ベースライン [53]	63.73	61.21	27.41	34.36	44.10	61.70	48.75
AdvNoise [51]	72.15	63.26	30.81	40.12	48.07	60.12	52.42
AdvBias [16]	77.45	62.12	51.09	70.20	51.12	50.69	60.45
RandConv [17]	75.52	57.23	44.21	61.27	49.98	54.21	57.07
MixStyle [52]	73.04	59.29	43.00	62.17	53.12	50.03	56.78
MaxStyle [7]	81.25	70.27	62.09	58.18	70.04	67.77	68.27
CSDG [18]	80.72	68.00	59.78	72.40	68.67	70.78	70.06
MedSAM [44]	72.32	73.31	61.53	64.46	68.89	61.39	66.98
DeSAM (全体ボックス)	82.30	78.06	66.65	82.87	77.58	79.05	77.75
DeSAM (グリッドポイント)	82.80	80.61	64.77	83.41	80.36	82.17	79.02
Impro. over baseline	+19.07	+19.40	+37.36	+49.05	+36.26	+20.47	++30.27

DeSAMは前の最先端のドメイン一般化手法より平均で8.96ポイント向上した前立腺分割のクロスサイトDiceスコアを達成（70.06%から79.02%へ）。
DeSAM（グリッドポイント）は総合Dice 79.02%を達成し、DeSAM（全体ボックス）および既存手法を上回る。DeSAM（全体ボックス）は77.75%を達成。
MedSAMと比較して、DeSAMは不良プロンプトへの感度を低減し、前立腺データセットでの平均Diceを向上させる（77.75% vs MedSAMの66.98%）。
アブレーションは以下を示す：PIMMのみで総合73.85%、IoUヘッド付きのPRIMを追加で75.12%、マスク埋め込みの融合を追加で75.81%、完全なDeSAMで79.02%の総合。
グリッドポイントプロンプトを1点から9点へ増やすと性能は保持・向上し、9点で79.02%、それ以上の点数でも安定（例：25点で79.03%）
DeSAMは事前計算済みの画像埋め込みを用いてエントリーレベルGPU（RTX 3060 12GB）で訓練でき、エンコーダーベースのチューニングと比較してメモリ要件を大幅に削減。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。