Skip to main content
QUICK REVIEW

[论文解读] Rethinking Coarse-to-Fine Approach in Single Image Deblurring

Sung‐Jin Cho, Seo-Won Ji|arXiv (Cornell University)|Aug 11, 2021
Advanced Image Processing Techniques参考文献 37被引用 34
一句话总结

介绍了 MIMO-UNet,一种单编码器-解码器的 U-Net,能够通过多输入单编码器、多输出单解码器以及非对称特征融合来处理多尺度模糊,从而实现快速且准确的去模糊。

ABSTRACT

Coarse-to-fine strategies have been extensively used for the architecture design of single image deblurring networks. Conventional methods typically stack sub-networks with multi-scale input images and gradually improve sharpness of images from the bottom sub-network to the top sub-network, yielding inevitably high computational costs. Toward a fast and accurate deblurring network design, we revisit the coarse-to-fine strategy and present a multi-input multi-output U-net (MIMO-UNet). The MIMO-UNet has three distinct features. First, the single encoder of the MIMO-UNet takes multi-scale input images to ease the difficulty of training. Second, the single decoder of the MIMO-UNet outputs multiple deblurred images with different scales to mimic multi-cascaded U-nets using a single U-shaped network. Last, asymmetric feature fusion is introduced to merge multi-scale features in an efficient manner. Extensive experiments on the GoPro and RealBlur datasets demonstrate that the proposed network outperforms the state-of-the-art methods in terms of both accuracy and computational complexity. Source code is available for research purposes at https://github.com/chosj95/MIMO-UNet.

研究动机与目标

  • Motivate reducing computational cost in coarse-to-fine deblurring architectures without sacrificing accuracy.
  • Develop a single U-Net that can process multi-scale blur via shared encoder/decoder with multi-scale outputs.
  • Design efficient fusion mechanisms to combine multi-scale features for robust deblurring.
  • Demonstrate superior PSNR/SSIM and faster runtime compared to state-of-the-art multi-network approaches on standard benchmarks.

提出的方法

  • Propose MIMO-UNet, a single U-Net with three encoder blocks and three decoder blocks.
  • Introduce Multi-Input Single Encoder (MISE) where each encoder block ingests a downscaled version of the blurry input and fuses it with learned features using a shallow convolutional module (SCM).
  • Introduce Multi-Output Single Decoder (MOSD) where each decoder level produces an intermediate deblurred image, enabling coarse-to-fine behavior within one network.
  • Implement Asymmetric Feature Fusion (AFF) to merge multi-scale encoder features across levels using attention-like modulation and cross-scale fusion.
  • Use multi-scale content loss (Lcont) plus a multi-scale frequency reconstruction loss (LMSFR) to supervise outputs across scales.
  • Train with GoPro and RealBlur datasets; show improved accuracy (PSNR/SSIM) and lower runtime versus stacked sub-network baselines.
Figure 1: Comparison between the proposed and conventional methods in terms of the PSNR and runtime. The runtime of the methods is reported as the runtime measured using the released test code of each method on our environment (filled) and the runtime provided in each paper (blank).
Figure 1: Comparison between the proposed and conventional methods in terms of the PSNR and runtime. The runtime of the methods is reported as the runtime measured using the released test code of each method on our environment (filled) and the runtime provided in each paper (blank).

实验结果

研究问题

  • RQ1Can a single U-Net architecture with multi-scale inputs and outputs outperform conventional coarse-to-fine networks with stacked sub-networks in single-image deblurring?
  • RQ2Do multi-scale feature fusion strategies (AFF) and cross-scale inputs/outputs improve deblurring performance under diverse blur conditions?
  • RQ3How does MIMO-UNet compare to state-of-the-art methods in PSNR/SSIM and computational efficiency on GoPro and RealBlur datasets?

主要发现

  • MIMO-UNet achieves competitive PSNR/SSIM while offering significantly lower runtime than stacked sub-network methods on GoPro.
  • MIMO-UNet++ attains the best PSNR among evaluated models on GoPro with 32.68 dB, and demonstrates strong performance on RealBlur (top PSNR/SSIM in the reported comparisons).
  • AFF provides measurable gains over simple fusion strategies, and combining MISE, MOSD, and AFF yields the largest PSNR improvement in ablations.
  • MSFR auxiliary loss further improves PSNR by up to ~0.57 dB over a baseline, highlighting the benefit of frequency-domain supervision.
  • Across benchmarks, MIMO-UNet variants demonstrate favorable speed-accuracy trade-offs, with MIMO-UNet++ delivering higher PSNR than several existing methods while maintaining faster runtimes.
Figure 2: Comparison of coarse-to-fine image deblurring network architectures: (a) DeepDeblur, (b) PSS-NSC, (c) MT-RNN, and (d) proposed MIMO-UNet.
Figure 2: Comparison of coarse-to-fine image deblurring network architectures: (a) DeepDeblur, (b) PSS-NSC, (c) MT-RNN, and (d) proposed MIMO-UNet.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。