QUICK REVIEW

[論文レビュー] Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

Ziyang Wang, Jian-Qing Zheng|arXiv (Cornell University)|Feb 7, 2024

Medical Image Segmentation Techniques被引用数 79

ひとこと要約

Mamba-UNet は UNet に似たエンコーダ-デコーダ構造内で純粋な Visual Mamba ブロックを使用し、長距離特徴のモデリングを向上させる。 MRI 心臓データで UNet および Swin-UNet より高いセグメンテーション精度を達成。

ABSTRACT

In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently modeling long-range dependencies within medical images, which is a critical aspect for precise segmentation. Inspired by the Mamba architecture, known for its proficiency in handling long sequences and global contextual information with enhanced computational efficiency as a State Space Model (SSM), we propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability. Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network. This design facilitates a comprehensive feature learning process, capturing intricate details and broader semantic contexts within medical images. We introduce a novel integration mechanism within the VMamba blocks to ensure seamless connectivity and information flow between the encoder and decoder paths, enhancing the segmentation performance. We conducted experiments on publicly available ACDC MRI Cardiac segmentation dataset, and Synapse CT Abdomen segmentation dataset. The results show that Mamba-UNet outperforms several types of UNet in medical image segmentation under the same hyper-parameter setting. The source code and baseline implementations are available.

研究の動機と目的

医用画像セグメンテーションの長距離依存性モデリングの向上を動機づける。
Visual Mamba ブロック（VSS）をエンコーダ、ボトルネック、デコーダとして用いる UNet 風アーキテクチャを提案する。
VMamba ベースのフレームワーク内でスキップ接続とパッチマージ/展開を用いて空間的ディテールを保持する。
公開MRI心臓データセットでセグメンテーション性能を評価し、ベースラインと比較する。

提案手法

純粋な Visual Mamba ブロックをコアビルディングブロックとする UNet-ライクなエンコーダ-デコーダを採用する。
入力画像をパッチトークンとして表現し、パッチマージ/展開で階層的な VSS ブロックを処理する。
エンコーダとデコーダ間でスキップ接続を用いて多段階特徴を融合する。
エンコーダに pretrained VMamba-Tiny をロードして初期化を改善する。
固定ハイパーパラメータの下で SGD で訓練し、標準的なセグメンテーション指標で評価する。

Figure 1: A brief introduction of the evolution of recent developments of UNet with incorporation of Transformer and State Space Models (SSM) for medical image segmentation.

実験結果

リサーチクエスチョン

RQ1VMamba ベースのブロックは従来の UNet および ViT ベースのアプローチと比較して医用画像セグメンテーションにおける長距離依存性モデリングを改善できるか？
RQ2純粋な VMamba UNet は MRI データ上で計算効率を維持しつつ優れたセグメンテーション精度を達成するか？
RQ3同一の訓練設定の下で Mamba-UNet と UNet および Swin-UNet の比較性能はどうなるか？

主な発見

フレームワーク	Dice ↑	IoU ↑	Acc ↑	Pre ↑	Sen ↑	Spe ↑	HD 95% ↓	ASD ↓
UNet [24]	0.9248	0.8645	0.9969	0.9157	0.9364	0.9883	2.7655	0.8180
Swin-UNet [3]	0.9188	0.8545	0.9968	0.9151	0.9231	0.9857	3.1817	0.9932
Mamba-UNet	0.9281	0.8698	0.9972	0.9275	0.9289	0.9859	2.4645	0.7677

Mamba-UNet は MRI 心臓テストセットで Dice 0.9281、IoU 0.8698、Accuracy 0.9972 を達成。
同じハイパーパラメータ下で UNet を上回る Dice および IoU を示す。
Mamba-UNet は基準値と比較して Hausdorff Distance (HD 95%) および ASD 点で競合的な成績を示し、HD 2.4645 および ASD 0.7677。
Swin-UNet は Dice 0.9188 および IoU 0.8545 を示し、いずれも Mamba-UNet を下回り、UNet は Dice 0.9248 を示す。
Mamba-UNet は報告指標に反映される境界の正確さで優れた性能を示す。

Figure 2: The architecture of Mamba-UNet, which is composed of encoder, bottleneck, decoder and skip connections. The encoder, bottleneck and decoder are all constructed based on Visual Mamba block.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。