QUICK REVIEW

[論文レビュー] Uformer: A General U-Shaped Transformer for Image Restoration

Zhendong Wang, Xiaodong Cun|arXiv (Cornell University)|Jun 6, 2021

Advanced Image Processing Techniques参考文献 76被引用数 109

ひとこと要約

Uformer は Locally-Enhanced Window (LeWin) ブロックと軽量なマルチスケール復元モジュレーターを備えた U 字型トランスフォーマーを導入し、ノイズ除去、ブレ補正、デフォーカスブレの除去、雨除去で最先端の結果を効率的な計算で達成します。

ABSTRACT

In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs nonoverlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

研究の動機と目的

画像復元における従来の ConvNet を超えた長距離依存性モデリングの必要性を動機づける。
マルチスケール画像復元タスクに適した一般的な U 字型トランスフォーマーアーキテクチャを提案する。
局所的なディテールとグローバル文脈のバランスを取る効率的な LeWin トランスフォーマーブロックを開発する。
スケール間のディテール回復を強化する軽量なマルチスケール復元モジュレーターを導入する。
ノイズ除去、ブレ補正、デフォーカスブレ、雨除去データセットにおいて最先端または競争力のある性能を示す。

提案手法

LeWin トランスフォーマーブロックに置換した畳み込みを用いるスキップ接続付きの階層的 UNet に似たエンコーダ–デコーダを提案する。
ローカル強化ウィンドウ (LeWin) トランスフォーマーブロックを導入し、非重複ウィンドウベースの自己注意 (W-MSA) と深さ方向畳み込みを伴う Locally-Enhanced Feed-Forward Network (LeFF) を組み合わせる。
自己注意は非重複の MxM ウィンドウを用いて複雑さを O(H^2W^2C) から O(M^2HW C) に低減する。
復元タスクに適応させるため、デコーダ機能の学習可能なウィンドウベースのバイアスとして複数スケールの復元モジュレーターを組み込む。

実験結果

リサーチクエスチョン

RQ1局所ウィンドウ付き自己注意と局所文脈 FFN を持つトランスフォーマーベースの U 形状アーキテクチャは、画像復元において局所的なディテールと長距離依存性を効果的に捉えられるか。
RQ2軽量なマルチスケール復元モジュレーターは、計算オーバーヘッドを大幅に増やすことなく、さまざまな劣化タイプに対して復元品質を向上させるか。
RQ3LeWin ブロックと従来の CNN またはグローバルアテンショントランスフォーマーとの性能・効率のトレードオフは、ノイズ除去、ブレ補正、雨除去タスクでどのようになるか。

主な発見

Uformer-B は SIDD で 39.89 dB PSNR、DND で 39.98 dB PSNR を達成し、これらの実画像ノイズデータセットで従来の最先端を上回る。
モーションデブラーリングにおいて、GoPro、RealBlur-R/J、HIDE データセットで最先端または競合的な結果を達成する。
デフォーカスブラーにおいて、DPD で従来法を最大 1.87 dB PSNR 上回り、SSIM も改善している。
実景雨除去(SPAD) において、Uformer-B は 47.84 dB PSNR と 0.9925 SSIM を達成し、PSNR の既存最高値を 3.74 dB 上回る。
アブレーションにより LeWin ブロックが普通の UNet バリアントよりも優れており、局所性強化 FFN が性能向上に寄与し、モジュレーターが追加の改善をもたらす（特に SPAD で顕著）。
提案されたモジュレーターは、デブラーリングで顕著な改善（0.46 dB）、ノイズ除去と deraining タスクでの利得を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。