QUICK REVIEW

[論文レビュー] MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation

Cheng Chen, Juzheng Miao|arXiv (Cornell University)|Sep 16, 2023

Advanced Neural Network Applications被引用数 8

ひとこと要約

MA-SAM は SAM を 3D 医療データへ適応させ、パラメータ効率的なファインチューニングと 3D アダプターを用いることで、CT、MRI、手術動画に対してプロンプトなしで強力な自動セグメンテーションを達成します。

ABSTRACT

The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. The effectiveness of our method has been comprehensively evaluated on four medical image segmentation tasks, by using 10 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.

研究の動機と目的

自然画像と医療画像間のドメインギャップにもかかわらず、医用画像の一般的なセグメンテーション基盤モデルとして SAM を活用する。
3D体積情報/時系列情報を取り込むためのモダリティ非依存かつパラメータ効率的なファインチューニングフレームワークを開発する。
プロンプトなしで CT、MRI、手術動画の各モダリティに対して効果的な自動セグメンテーションを実現し、良好な一般化を達成する。
小さな医療構造をより適切に扱うためにマスクデコーダの分解能を改善する。

提案手法

FacT ベースのパラメータ効率的ファインチューニングを使用して、画像エンコーダの低ランクウェイト増分を更新する（層間で共有される U/V ファクター、層ごとの Sigma）。
各トランスフォーマーブロック内に 3D アダプタを導入して、Conv3D（カーネル 3x1x1）を用いて体積的/時系列情報を抽出し、互換性を確保するために隣接スライス入力を再構成する。
マスクデコーダを完全にファインチューニングし、進行的アップサンプリングで元の解像度を回復し、小構造のセグメンテーションを向上させる。
3D コンテキストを 2D SAM バックボーンと整列させるため、入力を再構成する（隣接スライスをバッチ次元に結合；3D 畳み込み用に特徴マップを再整形）。
ハイブリッドなセグメンテーション損失（クロスエントロピー + Dice）とデータ拡張を用いて訓練する； ViT-H バックボーンで 400 エポックの大バッチファインチューニング。

Fig. 1: The overview of our proposed modality-agnostic SAM adaptation framework (MA-SAM) for medical image segmentation. The image encoder is updated through a parameter-efficient fine-tuning strategy with FacT. The volumetric or temporal information is effectively incorporated via a set of 3D adapt

実験結果

リサーチクエスチョン

RQ1 SAM をモダリティ非依存の軽量ファインチューニングで 3D 医療データへ効果的に適用できるか。
RQ2 2D SAM バックボーンに 3D アダプタを注入することで、医療セグメンテーションにおける体積/時系列情報の活用が改善されるか。
RQ3 マスクデコーダの全面ファインチューニングは医療セグメンテーションに有益か、進行的アップサンプリング戦略は解像度を改善するか。
RQ4 MA-SAM はプロンプトなしで CT、MRI、手術動画データセットへどの程度一般化するか。
RQ5 3D バウンディングボックスのようなプロンプトが、MA-SAM による難易度の高い腫瘍セグメンテーションをさらに向上させるか。

主な発見

Table 1: BTCV abdominal multi-organ Dice results (Dice [%]) and Average/HD (HD [%]) across methods	Dice columns: Spleen, Right Kidney, Left Kidney, Gall bladder, Esophagus, Liver, Stomach, Aorta, IVC, Veins, Pancreas, Adrenal Gland, Average Dice	HD columns: Spleen, Right Kidney, Left Kidney, Gall bladder, Esophagus, Liver, Stomach, Aorta, IVC, Veins, Pancreas, Adrenal Gland, Average HD
nnU-Net	97.0	95.3	95.3	63.5	77.5	97.4	89.1	90.1	88.5	79.0	87.1	75.2	86.3	1.07	1.19	1.19	7.49	8.56	1.14	4.84	14.11	2.87	5.67	2.31	2.23	4.39	4.39
3D UX-Net	94.6	94.2	94.3	59.3	72.2	96.4	73.4	87.2	84.9	72.2	80.9	67.1	81.4	3.17	1.59	1.26	4.53	13.92	1.75	19.72	12.53	3.47	9.99	3.70	4.11	6.68
SwinUNETR	95.6	94.2	94.3	63.6	75.5	96.6	79.2	89.9	83.7	75.0	82.2	67.3	83.1	1.21	1.41	1.37	2.25	5.82	1.70	13.75	5.92	4.46	7.58	3.53	3.40	4.37
nnFormer	93.5	94.9	95.0	64.1	79.5	96.8	90.1	89.7	85.9	77.8	85.6	73.9	85.6	78.03	1.41	1.43	3.00	4.92	1.38	4.24	7.53	4.02	6.53	2.96	2.76	9.95
SAMed_h	95.3	92.1	92.9	62.1	75.3	96.4	90.2	87.6	79.8	74.2	77.9	61.0	82.1	1.37	33.53	1.84	6.27	4.84	1.77	7.49	4.97	7.28	6.87	10.00	6.49	7.73	7.73
MA-SAM (Ours)	96.7	95.1	95.4	68.2	82.1	96.9	92.8	91.1	87.5	79.8	86.6	73.9	87.2	1.00	1.19	1.07	1.59	3.77	1.36	3.87	5.29	3.12	3.25	3.93	2.57	2.67	2.67

MA-SAM は促提示なしで four つのタスク（CT、MRI、手術ビデオ）において最先端の 3D 医療セグメンテーション手法を一貫して上回る。
BTCV 腹部多臓器セグメンテーションにおいて MA-SAM は Dice 87.2%、平均 87.2、HD 2.67% を達成し、nnU-Net および他のベースラインを上回る。
6-sites の前立腺 MRI データセットで MA-SAM は平均 Dice 92.6%、HD 1.94% を達成し、nnU-Net および SAMed_h ベースラインを上回る。
EndoVis18 手術シーンセグメンテーションで MA-SAM は mIoU 69.2、Dice 77.0 を達成し、タスク固有および SAM ベースの方法を上回る。
脳胚腺腫瘍セグメンテーション（MSD-Pancreas）では自動 MA-SAM が Dice 40.2、NSD 59.1 を達成し、3D バウンディングボックスプロンプトを用いると Dice が最大で 80.35% まで改善する。
プロンプトを用いた場合、難易度の高い腫瘍セグメンテーションシナリオで MA-SAM は nnU-Net を最大で Dice 38.7% 超えることがある。
AMOS22 CT/MRI データセットに対する zero-shot および few-shot の一般化が強く、nnU-Net および SOTA のドメイン一般化手法を上回る。

Fig. 2: Qualitative visualization of segmentation results generated from our MA-SAM method and other state-of-the-art methods on BTCV dataset. Abdominal organs are denoted in different colors as shown in the corresponding color bar.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。