QUICK REVIEW

[论文解读] Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Jiarun Liu, Hao Yang|arXiv (Cornell University)|Feb 5, 2024

Vehicle License Plate Recognition被引用 6

一句话总结

Swin-UMamba 是一个基于 Mamba 的 UNet，用于二维医疗图像分割，利用 ImageNet 预训练的 Mamba 块在多个数据集上提升准确性和效率，且存在一个更轻量的 Swin-UMamba dagger 变体，使用 Mamba 基解码器。

ABSTRACT

Accurate medical image segmentation demands the integration of multi-scale information, spanning from local features to global dependencies. However, it is challenging for existing methods to model long-range global information, where convolutional neural networks (CNNs) are constrained by their local receptive fields, and vision transformers (ViTs) suffer from high quadratic complexity of their attention mechanism. Recently, Mamba-based models have gained great attention for their impressive ability in long sequence modeling. Several studies have demonstrated that these models can outperform popular vision models in various tasks, offering higher accuracy, lower memory consumption, and less computational burden. However, existing Mamba-based models are mostly trained from scratch and do not explore the power of pretraining, which has been proven to be quite effective for data-efficient medical image analysis. This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks, leveraging the advantages of ImageNet-based pretraining. Our experimental results reveal the vital role of ImageNet-based training in enhancing the performance of Mamba-based models. Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba outperforms its closest counterpart U-Mamba_Enc by an average score of 2.72%.

研究动机与目标

动机：在医疗图像分割中建模远程依赖性的重要性，超越局部卷积神经网络感受野和代价高昂的 ViT。
引入一个基于 Mamba 的编码器，利用 ImageNet 基本预训练，适用于二维医疗图像。
设计一个 U-Net 风格的解码器，具有增强的跳跃连接和深度监督以实现准确分割。
提出一个更轻量的 Swin-UMamba dagger 变体，使用基于 Mamba 的解码器以提高效率。
展示预训练在使基于 Mamba 块的数据高效分割中的重要性。

提出的方法

采用在 ImageNet 上预训练的基于 Mamba 的编码器，以提取二维医疗图像的多尺度特征。
使用二维选择性扫描 VSS（SS2D）块来处理具有远程依赖性的二维视觉数据。
与 VMamba-Tiny 共享权重以初始化编码器，从而利用 ImageNet 预训练。
在 U 形结构中构建带跳跃连接和深度监督的 Swin-UMamba 解码器。
提供 Swin-UMamba dagger，一个更轻量的解码器变体，具有补丁扩展并减少参数/浮点运算。
采用 Dice + 交叉熵损失、深度监督、AdamW、cosine 衰减，以及对预训练权重分阶段冻结进行训练。

实验结果

研究问题

RQ1基于 ImageNet 的预训练是否能提升基于 Mamba 的模型在医疗图像分割中的性能？
RQ2Swin-UMamba 与 CNN、ViT 和其他基于 Mamba 的分割模型在不同数据集上的表现如何？
RQ3基于 Mamba 的解码器（Swin-UMamba dagger）是否能在参数和 FLOPs 更少的情况下实现有竞争力的结果？
RQ4预训练对使用 Mamba 块的医疗分割的收敛稳定性和数据效率有何影响？
RQ5Mamba 块的远程建模能力在二维医疗成像任务中的转化效果如何？

主要发现

Swin-UMamba 与 Swin-UMamba dagger 在 AbdomenMRI、Endoscopy 和 Microscopy 数据集上优于 CNN、ViT 以及先前的基于 Mamba 的模型。
ImageNet 基于预训练在 Swin-UMamba 上对 DSC 提示显著提升（在 AbdomenMRI 上约提升 3.04 个百分点），NSD 提升约 4.19 点。
预训练实现更快的收敛和训练稳定性，Swin-UMamba 在 AbdomenMRI 上所需迭代次数显著少于基线。
Swin-UMamba dagger 以显著更少的参数和 FLOPs（27M 参数，15.0G FLOPs）实现有竞争力的结果，相较于 Swin-UMamba（40M，58.4G）和 U-Mamba 变体。
在 Endoscopy 上，预训练的 Swin-UMamba dagger 相对于未预训练的变体带来显著提升（DSC/NSD 提升），凸显预训练在数据效率方面的好处。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。