[论文解读] Rethinking Coarse-to-Fine Approach in Single Image Deblurring
介绍了 MIMO-UNet,一种单编码器-解码器的 U-Net,能够通过多输入单编码器、多输出单解码器以及非对称特征融合来处理多尺度模糊,从而实现快速且准确的去模糊。
Coarse-to-fine strategies have been extensively used for the architecture design of single image deblurring networks. Conventional methods typically stack sub-networks with multi-scale input images and gradually improve sharpness of images from the bottom sub-network to the top sub-network, yielding inevitably high computational costs. Toward a fast and accurate deblurring network design, we revisit the coarse-to-fine strategy and present a multi-input multi-output U-net (MIMO-UNet). The MIMO-UNet has three distinct features. First, the single encoder of the MIMO-UNet takes multi-scale input images to ease the difficulty of training. Second, the single decoder of the MIMO-UNet outputs multiple deblurred images with different scales to mimic multi-cascaded U-nets using a single U-shaped network. Last, asymmetric feature fusion is introduced to merge multi-scale features in an efficient manner. Extensive experiments on the GoPro and RealBlur datasets demonstrate that the proposed network outperforms the state-of-the-art methods in terms of both accuracy and computational complexity. Source code is available for research purposes at https://github.com/chosj95/MIMO-UNet.
研究动机与目标
- Motivate reducing computational cost in coarse-to-fine deblurring architectures without sacrificing accuracy.
- Develop a single U-Net that can process multi-scale blur via shared encoder/decoder with multi-scale outputs.
- Design efficient fusion mechanisms to combine multi-scale features for robust deblurring.
- Demonstrate superior PSNR/SSIM and faster runtime compared to state-of-the-art multi-network approaches on standard benchmarks.
提出的方法
- Propose MIMO-UNet, a single U-Net with three encoder blocks and three decoder blocks.
- Introduce Multi-Input Single Encoder (MISE) where each encoder block ingests a downscaled version of the blurry input and fuses it with learned features using a shallow convolutional module (SCM).
- Introduce Multi-Output Single Decoder (MOSD) where each decoder level produces an intermediate deblurred image, enabling coarse-to-fine behavior within one network.
- Implement Asymmetric Feature Fusion (AFF) to merge multi-scale encoder features across levels using attention-like modulation and cross-scale fusion.
- Use multi-scale content loss (Lcont) plus a multi-scale frequency reconstruction loss (LMSFR) to supervise outputs across scales.
- Train with GoPro and RealBlur datasets; show improved accuracy (PSNR/SSIM) and lower runtime versus stacked sub-network baselines.

实验结果
研究问题
- RQ1Can a single U-Net architecture with multi-scale inputs and outputs outperform conventional coarse-to-fine networks with stacked sub-networks in single-image deblurring?
- RQ2Do multi-scale feature fusion strategies (AFF) and cross-scale inputs/outputs improve deblurring performance under diverse blur conditions?
- RQ3How does MIMO-UNet compare to state-of-the-art methods in PSNR/SSIM and computational efficiency on GoPro and RealBlur datasets?
主要发现
- MIMO-UNet achieves competitive PSNR/SSIM while offering significantly lower runtime than stacked sub-network methods on GoPro.
- MIMO-UNet++ attains the best PSNR among evaluated models on GoPro with 32.68 dB, and demonstrates strong performance on RealBlur (top PSNR/SSIM in the reported comparisons).
- AFF provides measurable gains over simple fusion strategies, and combining MISE, MOSD, and AFF yields the largest PSNR improvement in ablations.
- MSFR auxiliary loss further improves PSNR by up to ~0.57 dB over a baseline, highlighting the benefit of frequency-domain supervision.
- Across benchmarks, MIMO-UNet variants demonstrate favorable speed-accuracy trade-offs, with MIMO-UNet++ delivering higher PSNR than several existing methods while maintaining faster runtimes.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。