QUICK REVIEW

[論文レビュー] SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution

Yupeng Zhou, Zhen Li|arXiv (Cornell University)|Mar 17, 2023

Advanced Image Processing Techniques被引用数 13

ひとこと要約

SRFormer は permuted self-attention (PSA) を導入し、計算量を削減しつつ大窓の自己注意を可能にすることで、クラシック、軽量、実世界の画像超解像において最先端の成果を達成する。

ABSTRACT

Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e.g., SwinIR) can significantly improve the model performance. Still, the computation overhead is also considerable when the window size gradually increases. In this paper, we present SRFormer, a simple but novel method that can enjoy the benefit of large window self-attention but introduces even less computational burden. The core of our SRFormer is the permuted self-attention (PSA), which strikes an appropriate balance between the channel and spatial information for self-attention. Without any bells and whistles, we show that our SRFormer achieves a 33.86dB PSNR score on the Urban100 dataset, which is 0.46dB higher than that of SwinIR but uses fewer parameters and computations. In addition, we also attempt to scale up the model by further enlarging the window size and channel numbers to explore the potential of Transformer-based models. Experiments show that our scaled model, named SRFormerV2, can further improve the results and achieves state-of-the-art. We hope our simple and effective approach could be useful for future research in super-resolution model design. The homepage is https://z-yupeng.github.io/SRFormer/.

研究の動機と目的

パラメータや FLOPs を増やさずに大きな自己注意を SR にスケールさせる方法を調査する。
大窓を効果的に活用する自己注意機構を開発する。
改良された前方伝播ネットワークを通じて SR における高周波ディテール回復を向上させる。
性能を維持または向上させる軽量な SR モデルを設計する。
実世界の劣化シナリオに対する頑健性を示す。

提案手法

K/V チャンネル次元を削減し、トークンをチャネルへ置換することで大窓注意を可能にする permuted self-attention (PSA) を提案する。
Q を全チャネルとして、K/V はチャネルを削減し空間置換を用いて空間情報を保持する。
高周波ディテール回復を改善するために、2つのFFN線形層の間に深さ方向畳み込みを置いて ConvFFN を導入する。
ピクセル埋め込み層、階層的 PSA ベースの特徴エンコーダ、再構成ヘッドを備えた SRFormer を構築する。
HR 出力に対して L1 損失で学習し、自己アンサンブル SRFormer+ を活用して性能を向上させる。

実験結果

リサーチクエスチョン

RQ1パラメータや FLOPs を増やさずに大窓自己注意は SR の性能を改善できるか？
RQ2K/V のためにトークンをチャネルへ置換することで SR における大窓注意は効果的に機能するか？
RQ3FFN に局所的な depthwise 畳み込みを追加する（ConvFFN）と高周波ディテール回復は向上するか？
RQ4SRFormer はクラシック、軽量、実世界のタスクにおいて、最先端の SR 手法とどう比較されるか？

主な発見

PSA を用いた SRFormer は強力な SR 性能を発揮し、DIV2K で訓練した場合、Urban100 の 2x SR で 33.86 dB PSNR を達成し、SwinIR より 0.46 dB高い。
24x24 ウィンドウを用いる SRFormer は、SwinIR の 8x8 ウィンドウよりパラメータ数と MACs が少ない一方、PSNR はより高い。
5x5 の depthwise 畳み込みを用いた ConvFFN は、テストしたカーネルサイズの中で最高の高周波回復を示した。
大きな 24x24 PSA ウィンドウは、トークンを削減するバリアントやトークンをサンプリングするバリアントを上回り、アブレーション全体で一貫して性能を向上させる。
SRFormer-light は、複数のデータセットとスケールにおいて軽量 SR モデルの中で最先端の性能を達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。