QUICK REVIEW

[论文解读] VM-UNet: Vision Mamba UNet for Medical Image Segmentation

Jiacheng Ruan, Suncheng Xiang|arXiv (Cornell University)|Feb 4, 2024

Medical Image Segmentation Techniques被引用 205

一句话总结

VM-UNet 是基于纯状态空间模型的 U-Net，用于医学图像分割，使用 Vision Mamba 块，在 ISIC17/ISIC18 与 Synapse 数据集上取得具有竞争力的结果。

ABSTRACT

In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed with fewer convolution layers to save calculation cost. We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks. To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems. Our code is available at https://github.com/JCruan519/VM-UNet.

研究动机与目标

激发对仅基于 SSM 的医学图像分割模型的研究探索。
提出在一个非对称 U-Net 中使用 Vision Mamba 块（VSS）的 VM-UNet 架构。
为基于纯 SSM 的医学图像分割在公开数据集上建立基线。
在皮肤病变和多器官分割上评估 VM-UNet 以评估其竞争力。

提出的方法

采用四阶段的非对称编码器-解码器，配合补丁嵌入/扩展。
在编码器和解码器中均使用 Vision Mamba (VSS) 块作为核心特征提取器。
在 VSS 块中，应用带有 SS2D 的双分支通路以实现远程上下文建模。
实现带有扫描扩张/合并的 SS2D，以及由 Mamba 推导的 S6 块以捕捉方向依赖关系。
通过加法融合的简单跳跃连接，并使用 BceDice 或 CeDice 损失进行训练。
使用 VMamba-S 预训练权重初始化 VM-UNet，并在 ISIC17/ISIC18/Synapse 数据集上进行训练。

实验结果

研究问题

RQ1纯 SSM 基模型是否能在医学图像分割中达到具有竞争力的性能？
RQ2Vision Mamba UNet 相较于基于 CNN 和 Transformer 的基线在皮肤病变和器官分割上的表现如何？
RQ3预训练的 VMamba 权重对 VM-UNet 性能有什么影响？
RQ4VM-UNet 为未来的 SSM 基分割方法设定了哪些基线？

主要发现

VM-UNet 在 ISIC17 和 ISIC18 上实现了具有竞争力的 mIoU、DSC 和精度，优于若干基线。
在 ISIC17 上，VM-UNet 达到 mIoU 80.23%，DSC 89.03%，Acc 96.29%，Spe 97.58%，Sen 89.90%。
在 ISIC18 上，VM-UNet 达到 mIoU 81.35%，DSC 89.71%，Acc 94.91%，Spe 96.13%，Sen 91.12%。
在 Synapse 数据集上，VM-UNet 达到 DSC 81.08% 和 HD95 19.21。
与 Swin-UNet（纯 Transformer）相比，VM-UNet 在 DSC 提升 1.95%，在 HD95 降低 2.34 mm。
消融实验表明，使用 VMamba-S 预训练权重相比随机初始化显著提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。