QUICK REVIEW

[論文レビュー] Restormer: Efficient Transformer for High-Resolution Image Restoration

Syed Waqas Zamir, Aditya Arora|arXiv (Cornell University)|Nov 18, 2021

Advanced Image Processing Techniques参考文献 99被引用数 191

ひとこと要約

Restormerは、マルチ-Dconvヘッド転置注意機構とゲーティング-Dconvフィードフォワード網を備えた軽量Transformerを導入し、線形複雑度で高解像度画像復元を実現し、複数のタスクで最先端の結果を達成します。

ABSTRACT

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising). The source code and pre-trained models are available at https://github.com/swz30/Restormer.

研究の動機と目的

画像復元を、強い画像事前知識と長距離依存性を必要とする不適定問題として動機づける。
標準自己注意の二次的な計算量を克服して高解像度復元を実現する。
新規ビルディングブロック（MDTAとGDFN）と多スケール文脈学習のための段階的トレーニング戦略を備えたRestormerを提案する。

提案手法

高解像度画像をローカルウィンドウへ分割せずに処理するエンコーダ-デコーダアーキテクチャを導入する。
vanilla multi-head self-attentionを置換し、線形複雑度でチャネル間共分散を計算するMDTAを導入し、局所コンテキストを1x1および深さ方向畳み込みで取り込む。
ゲーティング機構と深さ方向畳み込みを用いて特徴変換を制御・豊かにするゲーティング-Dconvフィードフォワード網（GDFN）を提案する。
グローバルな画像統計を捉えるため、訓練を小さなパッチと大きなバッチで開始し、徐々に大きなパッチで小さなバッチへ移行する段階的学習戦略を採用する。
Deraining、deblurring、defocus deblurring（単一画像およびデュアルピクセル）、およびdenoisingのタスク特化型Restormerモデルを、パラメータとFLOPsを敵対的に小さくして訓練する。

実験結果

リサーチクエスチョン

RQ1Restormerは高解像度画像復元に適した線形複雑度でグローバルな画素相互作用をモデルできるか？
RQ2提案されているMDTAとGDFNの部品は、復元タスクにおける従来の注意機構とフィードフォワード網とどのように比較されるか？
RQ3段階的学習は複数の復元タスクにおける高解像度画像の性能を改善するか？
RQ4Deraining、モーションデブレ、defocus deblurring、denoisingのデータセットにおけるRestormerの最先端性能はどうか？

主な発見

Restormerは複数データセットにおいて、画像 deraining、単一画像モーションデブレ、defocus deblurring（単一画像とデュアルピクセル）、および画像 denoisingで最先端の結果を達成する。
平均して、Restormerは前回の最良のderaining法を5つのRainデatasets全体で1.05 dB上回る。
モーションデブレでは、RestormerがMIMO-UNet+を0.47 dB、MPRNetを0.26 dB上回り、MPRNetよりFLOPsを81%削減、IPTよりパラメータを4.4x削減、実行時間を29倍高速化する。
Gaussianグレースケール/カラーのdenoisingおよび実写denoisingで、Restormerは先行するCNN/Transformer法に匹敵するかそれ以上で、SIDD/DNDベンチマークで実写-denoiseでより高いPSNRを達成する。
RestormerはGoProでdeblurringを訓練しても、他データセットで最先端の性能を発揮するなど、強力な一般化を示す。
アブレーション研究は、MDTAとGDFNの組み合わせが高解像度都市データセット全体で最も良いPSNRを示し、設計選択を裏付けている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。