QUICK REVIEW

[論文レビュー] ShuffleMixer: An Efficient ConvNet for Image Super-Resolution

Long Sun, Jinshan Pan|arXiv (Cornell University)|May 30, 2022

Advanced Image Processing Techniques被引用数 72

ひとこと要約

ShuffleMixer は large-kernel depth-wise畳み込みと channel splitting/shuffling および Fused-MBConvs を用いて、従来の軽量モデルより約6x パラメータ/FLOPs 少なく、競争力の SR 性能を維持する最先端の効率を実現します。

ABSTRACT

Lightweight and efficiency are critical drivers for the practical application of image super-resolution (SR) algorithms. We propose a simple and effective approach, ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation. In contrast to previous SR models that simply stack multiple small kernel convolutions or complex operators to learn representations, we explore a large kernel ConvNet for mobile-friendly SR design. Specifically, we develop a large depth-wise convolution and two projection layers based on channel splitting and shuffling as the basic component to mix features efficiently. Since the contexts of natural images are strongly locally correlated, using large depth-wise convolutions only is insufficient to reconstruct fine details. To overcome this problem while maintaining the efficiency of the proposed module, we introduce Fused-MBConvs into the proposed network to model the local connectivity of different features. Experimental results demonstrate that the proposed ShuffleMixer is about 6x smaller than the state-of-the-art methods in terms of model parameters and FLOPs while achieving competitive performance. In NTIRE 2022, our primary method won the model complexity track of the Efficient Super-Resolution Challenge [23]. The code is available at https://github.com/sunny2109/MobileSR-NTIRE2022.

研究の動機と目的

モバイルおよびリソース制約のある環境での軽量かつ効率的なシングルイメージ超解像（SISR）を動機づける。
過度なパラメータを増やさずに受容野を拡張する large-kernel ConvNet 設計を導入する。
空間情報とチャネル情報を効率的にブレンドする feature mixing block を開発する。
細部再現性を向上させるため、Fused-MBConvs を介したローカル接続性を取り入れる。

提案手法

最初の 3x3 特徴抽出層を備えた ShuffleMixer アーキテクチャを開発する。
FMB（feature mixing blocks）を、2つの shuffle mixer 層と Fused-MBConv モジュールから構成して使用する。
チャンネル投影のパラメータ数を削減するため、チャンネル分割とシャッフル（CSS）を採用する。
ローカル特徴接続性を高めるため、2つの shuffle mixer 層の後に Fused-MBConv ブロックを埋め込む。
軽量な 1x1 conv とピクセルシャッフルを用いてアップサンプルし、残差結合を介して SR を再構成する。
高周波成分の保持を促すため、L1 ピクセル損失と FFT を介した周波数ドメイン損失で訓練する。

実験結果

リサーチクエスチョン

RQ1チャンネル分割を組み込んだ large-kernel depth-wise CNN は、パラメータと FLOPs を大幅に削減しつつ競争力のある SR 品質を達成できるか？
RQ2Fused-MBConvs とローカル接続性を組み込むことで ShuffleMixer の細部再構成が改善されるか？
RQ3カーネルサイズ、チャネル投影戦略、残差ブロックが SR 性能と効率に与える影響は何か？

主な発見

ShuffleMixer は、最先端の軽量 SR 手法より約6x 小さなパラメータと FLOP 数を達成しつつ、競争力のある PSNR/SSIM を提供する。
ShuffleMixer-Tiny（113K パラメータ）は、標準ベンチマークで多くの既存手法を上回る。
深さ方向カーネルサイズを増やすと PSNR は 7x7 まで改善され、追加コストは控えめ、ただしそれ以上のサイズは収益が低下する。
チャンネル分割とシャッフル（CSS）はパラメータ数を削減するが、パフォーマンスを回復するために投影層（CDC）を繰り返すことで補える。
Fused-MBConvs（S-FMBConv）を組み込むと、複雑さと SR 品質の間で有利なバランスを提供する。
GT ベンチマーク（×2、×3、×4）では、ShuffleMixer の派生モデルが強力な SR 性能を示し、GPU での実行時間も有利（1280×720 HR サイズで 0.016–0.021s）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。