QUICK REVIEW

[論文レビュー] Cross Aggregation Transformer for Image Restoration

Zheng Chen, Yulun Zhang|arXiv (Cornell University)|Nov 24, 2022

Advanced Image Processing Techniques被引用数 122

ひとこと要約

本論文は、画像復元のための Cross Aggregation Transformer (CAT) を導入し、長方形ウィンドウ自己注意 (Rwin-SA) と軸方向シフト、そして全局的な注意と局所的な CNN の帰納バイアスを融合する Locality Complementary Module を特徴とします。

ABSTRACT

Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results. Considering the high computational complexity of Transformer with global attention, some methods use the local square window to limit the scope of self-attention. However, these methods lack direct interaction among different windows, which limits the establishment of long-range dependencies. To address the above issue, we propose a new image restoration model, Cross Aggregation Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows. We also introduce the Axial-Shift operation for different window interactions. Furthermore, we propose the Locality Complementary Module to complement the self-attention mechanism, which incorporates the inductive bias of CNN (e.g., translation invariance and locality) into Transformer, enabling global-local coupling. Extensive experiments demonstrate that our CAT outperforms recent state-of-the-art methods on several image restoration applications. The code and models are available at https://github.com/zhengchen1999/CAT.

研究の動機と目的

画像復元における長距離依存のモデリングを改善する動機付けを行いつつ、計算量を抑える。
非正方形ウィンドウを跨いで特徴を集約し、受容野を拡大する transformer ベースのアーキテクチャを開発する。
Locality Complementary Module を介して CNN 的局所バイアスを取り入れ、グローバル情報と局所情報を結合する。
CAT を超解像、JPEG アーティファクト低減、リアルデノイズに適用し、最先端の性能を示す。

提案手法

矩形ウィンドウ自己注意 (Rwin-SA) を導入し、注意ヘッド間で横方向および縦方向の矩形ウィンドウを用いて注意領域を拡大する。
矩形の一辺を画像の高さまたは幅に固定して axial rectangle windows (axial-Rwin) を形成し、より広い相互作用を実現する。連続する Rwin-SA ブロック間に axial-shift 操作を含む。
Locality Complementary Module (LCM) を組み込み、自己注意と並行して値ブランチに深さ方向畳み込みを適用して局所と全球の手がかりを融合する。
RCAN に触発されたバックボーンに CAT blocks (CATB) を埋め込み、RCAB を CATB に置換して Cross Aggregation Transformer (CAT) を形成する。
CATB の残差グループと、SR、JPEG アーティファクト低減、リアルデノイズのタスクに合わせた再構成モジュールを用いる。

実験結果

リサーチクエスチョン

RQ1矩形ウィンドウ自己注意と axial-shift は、画像復元において square-window 自己注意よりも長距離依存をより効果的に捉えることができるか。
RQ2Locality-biased CNN 成分 (LCM) を Transformer 注意と統合すると、計算オーバーヘッドを大幅に増やさずに復元品質が向上するか。
RQ3CAT の標準的な画像復元タスク（SR、JPEG アーティファクト低減、リアルデノイズ）における性能は、最先端手法と比較してどうか。

主な発見

矩形ウィンドウ自己注意と axial-shift は、正方形ウィンドウ注意を上回り、画像復元ベンチマークでより高いPSNR/SSIMをもたらす。
LCM はグローバル自己注意と局所畳み込みバイアスを結合することで追加的な利得をもたらし、FLOPsの増分はほとんどない（約0.26%–0.32%）。
CAT-R（regular-Rwin）と CAT-A（axial-Rwin）は、SR のスケール全般で強い改善を達成し、特に Urban100 で顕著。CAT-A は提案されたバリアントの中で一般的に最良の性能を示す。
CAT ベースのモデルは、PSNR/SSIM 指標で画像 SR、JPEG アーティファクト低減、リアルデノイズにおいていくつかの最先端手法を上回る。
アブレーションは、適切な辺長設定を持つ axial-Rwin が最適な性能にとって重要であり、矩形ウィンドウと axial-shift の組み合わせが受容野を効果的に拡張することを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。