Skip to main content
QUICK REVIEW

[论文解读] U-shaped Vision Mamba for Single Image Dehazing

Zhuoran Zheng, Chen Wu|arXiv (Cornell University)|Feb 6, 2024
Image Enhancement Techniques被引用 15
一句话总结

本文提出 UVM-Net,一种 U-Net 风格的去雾网络,将 CNN 基于局部特征提取与 State Space Sequence Models (SSMs) 融合,以捕捉长程依赖,在 RESIDE 基准上取得优异结果。

ABSTRACT

Currently, Transformer is the most popular architecture for image dehazing, but due to its large computational complexity, its ability to handle long-range dependency is limited on resource-constrained devices. To tackle this challenge, we introduce the U-shaped Vision Mamba (UVM-Net), an efficient single-image dehazing network. Inspired by the State Space Sequence Models (SSMs), a new deep sequence model known for its power to handle long sequences, we design a Bi-SSM block that integrates the local feature extraction ability of the convolutional layer with the ability of the SSM to capture long-range dependencies. Extensive experimental results demonstrate the effectiveness of our method. Our method provides a more highly efficient idea of long-range dependency modeling for image dehazing as well as other image restoration tasks. The URL of the code is \url{https://github.com/zzr-idam/UVM-Net}. Our method takes only extbf{0.009} seconds to infer a $325 imes 325$ resolution image (100FPS) without I/O handling time.

研究动机与目标

  • 在资源受限设备上推动高效的长程依赖建模以实现图像去雾。
  • 开发一个基于 U-Net 的架构(UVM-Net),整合 Mamba/SSM 模块。
  • 在 RESIDE 及相关数据集上验证其与最先进去雾方法的性能对比。

提出的方法

  • 引入 Bi-SSM 模块,通过 Hadamard 乘积将两个 SSM 分支融合。
  • 将基于 SSM 的长程建模嵌入到具有跳跃连接的 U-Net 风格的编码-解码器中。
  • 展平/重塑特征图以实现通道域滚动以进行 SSM 处理。
  • 使用卷积块进行局部特征提取,然后进行基于 SSM 的处理和重建。
  • 在 256x256 图像上通过 #Param 和 MACs 评估开销。
  • 在 RESIDE 数据集上与基线去雾方法进行比较。
Figure 1: Overview of the UVM-Net architecture. UVM-Net employs the encoder-decoder framework with UVM-Net blocks in the encoder and convolution blocks in the decoder, together with skip connections. In UVM-Net block, our feature maps are first applied to a convolution operation, then the unfolded p
Figure 1: Overview of the UVM-Net architecture. UVM-Net employs the encoder-decoder framework with UVM-Net blocks in the encoder and convolution blocks in the decoder, together with skip connections. In UVM-Net block, our feature maps are first applied to a convolution operation, then the unfolded p

实验结果

研究问题

  • RQ1Bi-SSM 模块在保持高效的同时,是否能有效建模雾霾图像中的长程依赖?
  • RQ2将 SSM 集成到 U-Net 骨干网是否能在降低计算量的同时获得具有竞争力或更优的去雾性能?
  • RQ3在 RESIDE 及相关基准上,UVM-Net 与基于 Transformer 和 CNN 的去雾方法相比如何?

主要发现

  • UVM-Net 在 SOTS indoor/outdoor/mix 评估上达到 PSNR 40.17 和 SSIM 0.996(如 Table 1 所示)。
  • 在 RESIDE 风格的基准测试中,UVM-Net 在 SOTS-outdoor 的 PSNR 为 34.92、SSIM 为 0.984,在 SOTS-mix 的 PSNR 为 31.92、SSIM 为 0.982。
  • 该模型在所报告的配置下参数量为 19.25M,MACs 为 173.55G。
  • 消融实验表明移除 SSM,PSNR 降至 35.11(1D conv)或 38.25(SDP),且 SSIM 较低,强调 Bi-SSM 模块的效益。
  • UVM-Net 在 RESIDE 数据集上对若干基线方法显示出更优的性能,表明其在去雾任务中实现了高效的长程依赖建模。
Figure 2: Qualitative comparison of image dehazing methods on SOTS mix set, where the first rows are outdoor images, and the second row is indoor images. The third and fourth rows are real-world images. The first column is the hazy images and the last is the corresponding ground truth.
Figure 2: Qualitative comparison of image dehazing methods on SOTS mix set, where the first rows are outdoor images, and the second row is indoor images. The third and fourth rows are real-world images. The first column is the hazy images and the last is the corresponding ground truth.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。