QUICK REVIEW

[논문 리뷰] VM-UNet: Vision Mamba UNet for Medical Image Segmentation

Jiacheng Ruan, Suncheng Xiang|arXiv (Cornell University)|2024. 02. 04.

Medical Image Segmentation Techniques인용 수 205

한 줄 요약

VM-UNet은 Vision Mamba 블록을 사용하는 순수한 상태 공간 모델 기반의 의료 영상 분할 U-Net으로, ISIC17/ISIC18 및 Synapse 데이터셋에서 경쟁력 있는 성능을 달성합니다.

ABSTRACT

In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed with fewer convolution layers to save calculation cost. We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks. To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems. Our code is available at https://github.com/JCruan519/VM-UNet.

연구 동기 및 목표

의료 영상 분할을 위한 순수한 SSM 기반 모델의 탐구를 촉진한다.
비대칭 U-Net에서 Vision Mamba 블록(VSS)을 이용한 VM-UNet 아키텍처를 제안한다.
공개 데이터셋에서 순수 SSM 기반 의료 영상 분할의 기준선을 제시한다.
피부 병변 및 다기관 분할에 대해 VM-UNet을 평가하여 경쟁력을 평가한다.

제안 방법

패치 임베딩/확장을 포함하는 네 단계의 비대칭 인코더–디코더를 활용한다.
인코더와 디코더 모두에서 핵심 특징 추출기로 Vision Mamba (VSS) 블록을 사용한다.
VSS 블록에서 장거리 맥락 모델링을 위한 SS2D를 포함한 두 가지 분기 경로를 적용한다.
SS2D를 스캔 확장/병합 및 Mamba에서 파생된 S6 블록으로 방향성 의존성을 포착하도록 구현한다.
가법 융합을 통한 간단한 스킵 연결을 채택하고 BceDice 또는 CeDice 손실로 학습한다.
VM-UNet을 VMamba-S pretrained 가중치로 초기화하고 ISIC17/ISIC18/Synapse 데이터셋에서 학습한다.

실험 결과

연구 질문

RQ1순수 SSM 기반 모델이 의료 영상 분할에서 경쟁력 있는 성능을 달성할 수 있는가?
RQ2피부 병변 및 기관 분할에서 Vision Mamba UNet이 CNN- 및 Transformer 기반 기준선과 어떻게 비교되는가?
RQ3사전 학습된 VMamba 가중치가 VM-UNet 성능에 미치는 영향은 무엇인가?
RQ4VM-UNet이 미래의 SSM 기반 분할 방법에 어떤 기준선을 설정하는가?

주요 결과

Dataset	모델	mIoU (%) ↑	DSC (%) ↑	Acc (%) ↑	Spe (%) ↑	Sen (%) ↑
ISIC17	UNet	76.98	86.99	95.65	97.43	86.82
ISIC17	UTNetV2	77.35	87.23	95.84	98.05	84.85
ISIC17	TransFuse	79.21	88.40	96.17	97.98	87.14
ISIC17	MALUNet	78.78	88.13	96.18	98.47	84.78
ISIC17	VM-UNet	80.23	89.03	96.29	97.58	89.90
ISIC18	UNet	77.86	87.55	94.05	96.69	85.86
ISIC18	UNet++	78.31	87.83	94.02	95.75	88.65
ISIC18	Att-UNet	78.43	87.91	94.13	96.23	87.60
ISIC18	UTNetV2	78.97	88.25	94.32	96.48	87.60
ISIC18	SANet	79.52	88.59	94.39	95.97	89.46
ISIC18	TransFuse	80.63	89.27	94.66	95.74	91.28
ISIC18	MALUNet	80.25	89.04	94.62	96.19	89.74
ISIC18	VM-UNet	81.35	89.71	94.91	96.13	91.12
Synapse	VM-UNet	DSC 81.08	HD95 19.21	Aorta 86.40	Gallbladder 69.41	Kidney(L) 86.16	Kidney(R) 82.76	Liver 94.17	Pancreas 58.80	Spleen 89.51	Stomach 81.40

VM-UNet은 ISIC17 및 ISIC18에서 경쟁력 있는 mIoU, DSC 및 정확도를 달성하며 여러 기준선을 능가한다.
ISIC17에서 VM-UNet은 mIoU 80.23%, DSC 89.03%, Acc 96.29%, Spe 97.58%, Sen 89.90%를 달성한다.
ISIC18에서 VM-UNet은 mIoU 81.35%, DSC 89.71%, Acc 94.91%, Spe 96.13%, Sen 91.12%를 달성한다.
Synapse에서 VM-UNet은 DSC 81.08% 및 HD95 19.21로 데이터셋 전반에서 우수한 성능을 보인다.
Swin-UNet(순수 Transformer)와 비교했을 때 VM-UNet은 DSC에서 1.95% 포인트, HD95에서 2.34 mm를 개선한다.
아블레이션은 VMamba-S pretrained 가중치를 사용할 때 무작위 초기화에 비해 성능이 크게 향상됨을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.