QUICK REVIEW

[论文解读] Revisiting Image Deblurring with an Efficient ConvNet

Lingyan Ruan, Mojtaba Bemana|arXiv (Cornell University)|Feb 4, 2023

Advanced Image Processing Techniques被引用 22

一句话总结

本文提出了一种用于运动模糊和散焦去模糊的高效卷积神经网络，在 LFDOF 上展现出端到端的优越性能，并在与最先进方法的对比中具有竞争力，附带详细的消融和 ERF 分析。

ABSTRACT

Image deblurring aims to recover the latent sharp image from its blurry counterpart and has a wide range of applications in computer vision. The Convolution Neural Networks (CNNs) have performed well in this domain for many years, and until recently an alternative network architecture, namely Transformer, has demonstrated even stronger performance. One can attribute its superiority to the multi-head self-attention (MHSA) mechanism, which offers a larger receptive field and better input content adaptability than CNNs. However, as MHSA demands high computational costs that grow quadratically with respect to the input resolution, it becomes impractical for high-resolution image deblurring tasks. In this work, we propose a unified lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers while bearing less computational costs. Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure, attaining comparable or larger ERF than Transformers but with a smaller parameter scale. Specifically, we achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs. Extensive experiments demonstrate the superior performance of our network and the effectiveness of each module. Furthermore, we propose a compact and intuitive ERFMeter metric that quantitatively characterizes ERF, and shows a high correlation to the network performance. We hope this work can inspire the research community to further explore the pros and cons of CNN and Transformer architectures beyond image deblurring tasks.

研究动机与目标

Investigate an efficient ConvNet architecture for both motion and defocus deblurring.
Examine ablations of network structure and layer configurations to maximize performance and efficiency.
Evaluate generalization across multiple datasets and compare with state-of-the-art methods.

提出的方法

Introduce LaKD block-based architecture for deblurring with an emphasis on expanding the effective receptive field.
Provide ablations comparing dilated convolutions versus LaKD block structures.
Conduct two-stage training and end-to-end training, including On LFDOF synthetic defocus data and GoPro/HIDE/RealBlur for motion/defocus benchmarks.
Perform ERF (effective receptive field) fitting and ERFMeter analysis to quantify receptive field growth during training.
Compare against state-of-the-art methods such as Restormer and DRBNet across multiple datasets with both qualitative and quantitative metrics.

实验结果

研究问题

RQ1Does the LaKD block with an expanded effective receptive field improve deblurring performance over dilated-convolution variants?
RQ2What is the influence of feature mixing module depth on deblurring quality and efficiency?
RQ3Can an end-to-end network trained on LFDOF and GoPro/HIDE/RealBlur generalize to other defocus and motion blur datasets?
RQ4How does the proposed method compare to current state-of-the-art methods on LFDOF, DPDD, RealDOF, RealBlur, and CUHK datasets?

主要发现

The proposed method achieves superior PSNR/SSIM/LPIPS on LFDOF (31.87 PSNR, 0.912 SSIM, 0.115 LPIPS) compared to AIFNet and DRBNet.
A two-stage training strategy with LFDOF and subsequent real datasets yields better end-to-end performance than single-stage approaches.
Two sequential depthwise and pointwise layers in the feature mixing module provide the best balance of accuracy and efficiency.
Dilated convolution variants underperform the LaKD block in the ablation study, indicating the effectiveness of the LaKD design in enlarging the receptive field.
ERF analysis shows progressive growth of receptive fields during training, aligning with established theories on receptive field expansion.
Qualitative results demonstrate competitive or superior visual results against Restormer and other baselines across GoPro, HIDE, RealBlur, DPDD, RealDOF, and CUHK datasets.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。