QUICK REVIEW

[論文レビュー] Customized Segment Anything Model for Medical Image Segmentation

Kaidong Zhang, Dong Liu|arXiv (Cornell University)|Apr 26, 2023

Advanced Neural Network Applications被引用数 121

ひとこと要約

SAMed は LoRA ファインチューニングを用いて Segment Anything Model (SAM) を医用画像セグメンテーション向けにカスタマイズし、展開/ストレージのオーバーヘッドを最小限に抑えつつ競争力のある結果を達成します。

ABSTRACT

We propose SAMed, a general solution for medical image segmentation. Different from the previous methods, SAMed is built upon the large-scale image segmentation model, Segment Anything Model (SAM), to explore the new research paradigm of customizing large-scale models for medical image segmentation. SAMed applies the low-rank-based (LoRA) finetuning strategy to the SAM image encoder and finetunes it together with the prompt encoder and the mask decoder on labeled medical image segmentation datasets. We also observe the warmup finetuning strategy and the AdamW optimizer lead SAMed to successful convergence and lower loss. Different from SAM, SAMed could perform semantic segmentation on medical images. Our trained SAMed model achieves 81.88 DSC and 20.64 HD on the Synapse multi-organ segmentation dataset, which is on par with the state-of-the-art methods. We conduct extensive experiments to validate the effectiveness of our design. Since SAMed only updates a small fraction of the SAM parameters, its deployment cost and storage cost are quite marginal in practical usage. The code of SAMed is available at https://github.com/hitachinsk/SAMed.

研究の動機と目的

Extend SAM to semantic medical image segmentation with meaningful tissue labels.
Enable efficient fine-tuning by updating a small fraction of SAM parameters.
Leverage warmup and AdamW to stabilize training and improve convergence.
Demonstrate competitive performance on the Synapse multi-organ dataset.
Show that SAMed incurs marginal deployment/storage overhead while remaining SAM-compatible.

提案手法

Freeze the SAM image encoder and apply LoRA to its transformer blocks to learn medical features.
Fine-tune the prompt encoder and the mask decoder (optionally with LoRA) for semantic segmentation.
Adapt SAM's output to predict k semantic masks corresponding to k classes (including background) and compute final S via Softmax and ArgMax over the class dimension.
Use cross-entropy and Dice losses with a downsampled ground truth for training supervision.
Adopt warmup phase and AdamW optimizer to stabilize training and improve convergence.
Demonstrate compatibility with SAM and show reduced updated parameter count (e.g., 18.81M with LoRA on image encoder only).

実験結果

リサーチクエスチョン

RQ1Can SAM, when fine-tuned with LoRA on medical data, perform semantic segmentation for medical images?
RQ2Does updating a small fraction of SAM parameters yield competitive segmentation accuracy with lower deployment/storage overhead?
RQ3What training strategies (warmup, AdamW) improve convergence and performance for SAM adaptations on medical data?
RQ4How does SAMed compare to state-of-the-art medical segmentation models on Synapse in DSC and HD?
RQ5Can SAMed provide meaningful semantic labeling for different tissues while maintaining SAM compatibility?

主な発見

Method	DSC ↑	HD ↓	Aorta	Gallbladder	Kidney(L)	Kidney(R)	Liver	Pancreas	Spleen	Stomach
U-Net	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58
Att-UNet	77.77	36.02	89.55	68.88	77.98	71.11	93.57	58.04	87.30	75.75
TransUnet	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62
SwinUnet	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60
MissFormer	81.96	18.20	86.99	68.65	85.21	82.00	94.41	65.67	91.92	80.81
TransDeepLab	80.16	21.25	86.04	69.16	84.08	79.88	93.53	61.19	89.00	78.40
HiFormer	80.39	14.70	86.21	65.69	85.23	79.77	94.61	59.52	90.99	81.08
DAE-Former	82.43	17.46	88.96	72.30	86.08	80.88	94.98	65.12	91.94	79.19
SAMed	81.88	20.64	87.77	69.11	80.45	79.95	94.80	72.17	88.72	82.06

SAMed achieves 81.88 DSC and 20.64 HD on the Synapse multi-organ dataset, on par with state-of-the-art baselines.
SAMed attains state-of-the-art performance on pancreas and stomach segmentation within the Synapse results.
Only a small fraction of SAM parameters are updated (e.g., 18.81M vs 358M original; 5.25% of original size), keeping deployment/storage overhead marginal.
LoRA applied to the image encoder (and optionally mask decoder) yields better performance than updating only the mask decoder.
Warmup and AdamW optimizer significantly stabilize training and improve convergence and final loss.
SAMed remains fully compatible with SAM and can be used as a plug-in to empower SAM for medical image segmentation.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。