QUICK REVIEW

[论文解读] VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis

Zhihan Ju, Wanting Zhou|arXiv (Cornell University)|May 9, 2024

Image Retrieval and Classification Techniques被引用 6

一句话总结

VM-DDPM 引入 Vision Mamba 扩散模型，将 State Space Models 与 CNNs 结合，实现高效、全局感知的医学图像合成，在多个数据集上达到先进的 FID。

ABSTRACT

In the realm of smart healthcare, researchers enhance the scale and diversity of medical datasets through medical image synthesis. However, existing methods are limited by CNN local perception and Transformer quadratic complexity, making it difficult to balance structural texture consistency. To this end, we propose the Vision Mamba DDPM (VM-DDPM) based on State Space Model (SSM), fully combining CNN local perception and SSM global modeling capabilities, while maintaining linear computational complexity. Specifically, we designed a multi-level feature extraction module called Multi-level State Space Block (MSSBlock), and a basic unit of encoder-decoder structure called State Space Layer (SSLayer) for medical pathological images. Besides, we designed a simple, Plug-and-Play, zero-parameter Sequence Regeneration strategy for the Cross-Scan Module (CSM), which enabled the S6 module to fully perceive the spatial features of the 2D image and stimulate the generalization potential of the model. To our best knowledge, this is the first medical image synthesis model based on the SSM-CNN hybrid architecture. Our experimental evaluation on three datasets of different scales, i.e., ACDC, BraTS2018, and ChestXRay, as well as qualitative evaluation by radiologists, demonstrate that VM-DDPM achieves state-of-the-art performance.

研究动机与目标

通过生成高质量的合成图像来解决医学成像中的数据稀缺问题。
将 CNN 与 State Space Models（SSM）结合，以实现线性计算复杂度的全局上下文建模。
为医学图像设计多层级特征融合（MSSBlock）和编码器-解码器 SSLayer 单元。
通过改进的 Cross-Scan Module (CSM) 和零参数的 Sequence Regeneration 策略，提升空间连续性和纹理真实感。

提出的方法

提出 VM-DDPM，一种建立在 CNN-SSM 混合骨干上的去噪扩散概率模型（DDPM）。
引入 MSSBlock，作为结合 CSM 与 CNN 路径的多层级特征提取单元。
实现 SSLayer 作为带残差连接和时间嵌入处理的编码器/解码器基本单元。
通过一个即插即用的 Sequence Regeneration 策略在 S6 操作前打乱补丁顺序来增强 Cross-Scan Module (CSM)。
采用类似 U-Net 的编码器-瓶颈-解码器架构，并通过跨尺度的跳跃连接进行特征融合。

实验结果

研究问题

RQ1一个 CNN-SSM 混合扩散模型是否能够实现无条件医学图像合成的竞争力或更高质量和多样性？
RQ2Sequence Regeneration 策略是否提升基于 SSM 的扩散模型的空间连续性和泛化能力？
RQ3与 GAN 与 DDPM 基线相比，VM-DDPM 在不同规模与模态的数据集（ACDC、BraTS2018、ChestXRay）上的表现如何？
RQ4基于 MSSBlock 的多层级特征融合对合成医学图像的纹理和结构有什么影响？

主要发现

VM-DDPM 在三个数据集上实现了优于基于 GAN 的以及若干 DDPM 基线的 FID 分数。
在 ChestXRay、BraTS2018 和 ACDC 上，VM-DDPM 的 FID 分别为 11.783、12.513 和 34.525（越低越好）。
消融实验表明，Sequence Regeneration 策略在各数据集上相较原始 CSM 提升了性能。
定性放射科医师评估表明，合成图像与真实图像很难区分，病理和纹理相似。
该方法展示了强跨数据集的泛化能力和医学图像合成的可扩展性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。