QUICK REVIEW

[论文解读] VmambaIR: Visual State Space Model for Image Restoration

Yuan Shi, Bin Xia|arXiv (Cornell University)|Mar 18, 2024

Image and Signal Denoising Methods被引用 8

一句话总结

VmambaIR 在 Unet 框架中利用新颖的 Omni Selective Scan 来结合状态空间模型来建模图像修复，在降低计算量和参数数量的条件下，在去雨、单图超分辨率和真实世界超分辨率等任务上实现了业界领先的结果。

ABSTRACT

Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range dependencies. DMs require large prior models and computationally intensive denoising steps. Transformers have powerful modeling capabilities but face challenges due to quadratic complexity with input image size. To address these challenges, we propose VmambaIR, which introduces State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks. We utilize a Unet architecture to stack our proposed Omni Selective Scan (OSS) blocks, consisting of an OSS module and an Efficient Feed-Forward Network (EFFN). Our proposed omni selective scan mechanism overcomes the unidirectional modeling limitation of SSMs by efficiently modeling image information flows in all six directions. Furthermore, we conducted a comprehensive evaluation of our VmambaIR across multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that our proposed VmambaIR achieves state-of-the-art (SOTA) performance with much fewer computational resources and parameters. Our research highlights the potential of state space models as promising alternatives to the transformer and CNN architectures in serving as foundational frameworks for next-generation low-level visual tasks.

研究动机与目标

通过解决CNN、Transformer和扩散模型在处理长程依赖性与效率方面的局限性，推动图像修复的提升。
为二维图像数据开发具有线性复杂度的状态空间模型为基础的架构。
设计具有 OSS (Omni Selective Scan) 块的多尺度 UNet，以捕捉六个方向的信息流。
在降低资源消耗的前提下，展示 VmambaIR 在去雨、SR 以及真实世界 SR 任务上的有效性。

提出的方法

采用堆叠了所提出的 OSS 块的 UNet 式架构。
引入通过两条流处理输入并使用 CNNs 映射特征维度的 OSS 模块。
引入 Efficient Feed-Forward Network (EFFN) 以调控分层信息流。
实现 Omni Selective Scan (OSS) 用于在六个方向建模信息流（三维，带双向扫描），使用 Mamba 块进行高频建模。
使用离散化的 State Space Model (SSM) 结合基于 ZOH 的离散化进行高效的图像特征序列建模。

实验结果

研究问题

RQ1线性复杂度的状态空间建模方法是否能够达到甚至超过基于 Transformer/CNN 的图像修复性能？
RQ2Omni Selective Scan 是否能够在图像中实现超出单向 Mamba 块的全面、多方向信息流建模？
RQ3OSS、双向通道扫描和 EFFN 对 SR、真实世界 SR 和去雨任务的修复精度与效率有何贡献？

主要发现

VmambaIR 在图像修复任务上实现了最先进的性能，包括图像去雨、单图像超分辨率以及真实世界超分辨率。
在真实世界的 4× 超分辨率中，VmambaIR 的计算成本约为基线方法的 26%，同时提供更高的重建精度。
消融研究表明，OSS 显著优于单向扫描，双向通道扫描提升准确性，EFFN 提升信息流与效率。
与现有 SOTA 方法在若干基准上相比，VmambaIR 以更少的参数和 FLOPs 提供更好的高频细节。
定性结果显示在保留更细粒度细节（如人脸的眼睛/鼻子、水面等）方面，副作用更少。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。