QUICK REVIEW

[論文レビュー] Revisiting Image Deblurring with an Efficient ConvNet

Lingyan Ruan, Mojtaba Bemana|arXiv (Cornell University)|Feb 4, 2023

Advanced Image Processing Techniques被引用数 22

ひとこと要約

本論文は運動ブラーとデフォーカス除去のための効率的なConvNetを提案し、LFDOFでのエンドツーエンド性能が優れていること、最先端手法と比較して競争力のある結果、詳細なアブレーションとERF分析を示す。

ABSTRACT

Image deblurring aims to recover the latent sharp image from its blurry counterpart and has a wide range of applications in computer vision. The Convolution Neural Networks (CNNs) have performed well in this domain for many years, and until recently an alternative network architecture, namely Transformer, has demonstrated even stronger performance. One can attribute its superiority to the multi-head self-attention (MHSA) mechanism, which offers a larger receptive field and better input content adaptability than CNNs. However, as MHSA demands high computational costs that grow quadratically with respect to the input resolution, it becomes impractical for high-resolution image deblurring tasks. In this work, we propose a unified lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers while bearing less computational costs. Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure, attaining comparable or larger ERF than Transformers but with a smaller parameter scale. Specifically, we achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs. Extensive experiments demonstrate the superior performance of our network and the effectiveness of each module. Furthermore, we propose a compact and intuitive ERFMeter metric that quantitatively characterizes ERF, and shows a high correlation to the network performance. We hope this work can inspire the research community to further explore the pros and cons of CNN and Transformer architectures beyond image deblurring tasks.

研究の動機と目的

運動ブラーとデフォーカス除去の両方に対して効率的なConvNetアーキテクチャを調査する。
パフォーマンスと効率を最大化するためのネットワーク構造と層構成のアブレーションを検討する。
複数のデータセットに対する一般化を評価し、最先端手法と比較する。

提案手法

効果的受容野を拡張することに重点を置いたデブラーリングのLaKDブロックベースのアーキテクチャを導入する。
拡張畳み込みとLaKDブロック構造を比較するアブレーションを提供する。
LFDOFの合成デフォーカスデータとモーション／デフォーカスベンチマークとしてGoPro/HIDE/RealBlurを含む、2段階のトレーニングとエンドツーエンド訓練を実施する。
ERF（有効受容野）の適合とERFMeter分析を実施し、訓練中の受容野の成長を定量化する。
RestormerやDRBNetなどの最先端手法と、複数データセットを跨いで定性的・定量的指標で比較する。

実験結果

リサーチクエスチョン

RQ1拡張された有効受容野を持つLaKDブロックは、膨張畳み込み変種よりデブラーリング性能を改善するか？
RQ2特徴混合モジュールの深さがデブラーリングの品質と効率に与える影響は何か？
RQ3LFDOFとGoPro/HIDE/RealBlurで訓練したエンドツーエンドネットワークは、他のデフォーカスおよびモーションブラーのデータセットに一般化できるか？
RQ4LFDOF、DPDD、RealDOF、RealBlur、CUHKデータセットにおいて、提案手法は現状の最先端手法とどのように比較されるか？

主な発見

提案手法はLFDOFでAIFNetおよびDRBNetと比較してPSNR/SSIM/LPIPSが優れている（31.87 PSNR、0.912 SSIM、0.115 LPIPS）。
LFDOFとその後の実データセットを用いた2段階トレーニング戦略は、単段階アプローチよりエンドツーエンド性能を向上させる。
特徴混合モジュールにおける2段の逐次深さ方向と点方向レイヤーが、精度と効率の最良のバランスを提供する。
アブレーション研究で膨張畳み込み変種はLaKDブロックに劣り、受容野を拡張するLaKD設計の有効性を示している。
ERF分析は訓練中の受容野の進行的成長を示し、受容野の拡張に関する既存理論と一致する。
定性的結果はGoPro、HIDE、RealBlur、DPDD、RealDOF、CUHKデータセットでRestormerや他のベースラインに対して競合的または優れた視覚的結果を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。