QUICK REVIEW

[논문 리뷰] DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

Yifan Gao, Wei Xia|arXiv (Cornell University)|2023. 06. 01.

Advanced Neural Network Applications인용 수 19

한 줄 요약

DeSAM은 SAM의 마스크 디코더를 프롬프트 관련 IoU와 프롬프트 불변 마스킹으로 분리하여, 의료 영상 분할의 단일 소스 도메인 일반화에 완전 자동화를 가능하게 하고, 다곳 간 전립선 분할에서 강한 성능을 달성합니다.

ABSTRACT

Deep learning-based medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a prompt-driven foundation model with powerful generalization capabilities, the Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM performs significantly worse in automatic segmentation scenarios than when manually prompted, hindering its direct application to domain generalization. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of inevitable poor prompts and mask generation. To address the coupling effect, we propose the Decoupled SAM (DeSAM). DeSAM modifies SAM's mask decoder by introducing two new modules: a prompt-relevant IoU module (PRIM) and a prompt-decoupled mask module (PDMM). PRIM predicts the IoU score and generates mask embeddings, while PDMM extracts multi-scale features from the intermediate layers of the image encoder and fuses them with the mask embeddings from PRIM to generate the final segmentation mask. This decoupled design allows DeSAM to leverage the pre-trained weights while minimizing the performance degradation caused by poor prompts. We conducted experiments on publicly available cross-site prostate and cross-modality abdominal image segmentation datasets. The results show that our DeSAM leads to a substantial performance improvement over previous state-of-theart domain generalization methods. The code is publicly available at https://github.com/yifangao112/DeSAM.

연구 동기 및 목표

다중 소스 데이터나 타깃 도메인 데이터를 필요로 하지 않고 기초 모델(SAM)을 활용하여 의료 영상 분할에서의 도메인 시프트를 해결한다.
이미지 임베딩과 프롬프트 임베딩 간의 프롬프트 주도 결합을 제거하여 완전 자동 분할을 개선한다.
인코더를 동결하고 이미지 임베딩을 미리 계산하여 메모리 효율적인 학습을 가능하게 한다.
여러 임상 사이트에 걸친 전립선 MRI 데이터셋에서 교차 사이트 일반화가 향상되었음을 보여준다.

제안 방법

SAM의 마스크 디코더를 두 모듈로 분리한다: Prompt-Relevant IoU Module (PRIM)와 Prompt-Invariant Mask Module (PIMM).
이미지 및 프롬프트 인코더를 동결하고, SAM 이미지 인코더를 사용하여 이미지 임베딩을 미리 계산하여 GPU 메모리 사용을 줄인다.
PRIM은 교차 어텐션 트랜스포머를 사용하여 마스크 임베딩과 IoU 점수를 생성하며(직접적인 마스크 헤드는 없음).
PIMM은 다중 스케일의 이미지 임베딩을 PRIM 출력과 결합하여 최종 마스크를 생성하는 U-Net/UNETR 유사한 인코더-디코더 구조를 사용한다.
그리드 포인트 프롬프트(그리드 9x9, 정답 안팎의 포인트 포함) 또는 전체 박스 프롬프트로 학습한다; 손실은 마스크에 대해 L_dice, L_ce, IoU에 대해 L_mse를 포함한다.
그리드 모드의 그라운드 트루트 감독은 L_points = L_dice + L_ce + L_mse 로 가중치 (1,1,10)을 사용하고; 박스 모드에서는 L_box = L_dice + L_ce를 사용한다.

실험 결과

연구 질문

RQ1마스크 디코더를 분리하면 완전 자동 SAM 기반 의료 영상 분할에서 열악한 프롬프트의 악영향을 억제할 수 있는가?
RQ2이미지 인코더를 동결하고 이미지 임베딩을 미리 계산하는 것이 엔트리 레벨 GPU에서 학습을 가능하게 하면서 강한 도메인 간 일반화를 달성하는가?
RQ3교차 사이트 전립선 분할에서 DeSAM이 기존의 단일 소스 도메인 일반화 방법과 어떻게 비교되는가?
RQ4전반적 성능에 대한 각 구성요소(PRIM, PIMM, IoU head, 융합 전략)의 기여도는 무엇인가?

주요 결과

방법	A에서 Rest까지	B에서 Rest까지	C에서 Rest까지	D에서 Rest까지	E에서 Rest까지	F에서 Rest까지	전체
Upper bound [53]	85.38	83.68	82.15	85.21	87.04	84.29	84.63
Baseline [53]	63.73	61.21	27.41	34.36	44.10	61.70	48.75
AdvNoise [51]	72.15	63.26	30.81	40.12	48.07	60.12	52.42
AdvBias [16]	77.45	62.12	51.09	70.20	51.12	50.69	60.45
RandConv [17]	75.52	57.23	44.21	61.27	49.98	54.21	57.07
MixStyle [52]	73.04	59.29	43.00	62.17	53.12	50.03	56.78
MaxStyle [7]	81.25	70.27	62.09	58.18	70.04	67.77	68.27
CSDG [18]	80.72	68.00	59.78	72.40	68.67	70.78	70.06
MedSAM [44]	72.32	73.31	61.53	64.46	68.89	61.39	66.98
DeSAM (whole box)	82.30	78.06	66.65	82.87	77.58	79.05	77.75
DeSAM (grid points)	82.80	80.61	64.77	83.41	80.36	82.17	79.02
Impro. over baseline	+19.07	+19.40	+37.36	+49.05	+36.26	+20.47	++30.27

DeSAM은 이전 최첨단 도메인 일반화 방법들보다 전립선 분할의 교차 사이트 Dice 점수를 평균 8.96 포인트 향상시키며(70.06%에서 79.02%로),
DeSAM(그리드 포인트)는 총 Dice 79.02%를 달성하여 DeSAM(전체 박스) 및 이전 방법들보다 우수하며; DeSAM(전체 박스)는 77.75%를 달성한다.
MedSAM과 비교하면 DeSAM은 열악한 프롬프트에 대한 민감도를 줄이고 전립선 데이터셋에서 평균 Dice를 더 높게 달성한다(DeSAM 77.75% 대 MedSAM 66.98%).
결과분해(Ablation)에서: PIMM만으로는 전체 73.85%; IoU 헤드를 갖춘 PRIM을 추가하면 75.12%; 마스크 임베딩 융합을 추가하면 75.81%; 전체 DeSAM은 79.02%에 도달한다.
그리드 포인트 프롬프트를 1에서 9 포인트로 늘리면 성능이 유지되거나 향상되어 9 포인트에서 79.02%에 도달하고 더 많은 포인트에서도 안정적으로 유지된다(예: 25 포인트에서 79.03%).
DeSAM은 엔트리 레벨 GPU(RTX 3060 12GB)에서 학습될 수 있어 사전 계산된 이미지 임베딩으로 메모리 요구량을 크게 줄인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.