QUICK REVIEW

[論文レビュー] RefConv: Re-parameterized Refocusing Convolution for Powerful ConvNets

Zhicheng Cai, Xiaohan Ding|arXiv (Cornell University)|Oct 16, 2023

Advanced Neural Network Applications被引用数 16

ひとこと要約

RefConvは通常の畳み込み層を再パラメータ化されたリフォーカス機構に置き換え、事前学習モデルから学習したカーネルパラメータを結びつけて推論コストを増やすことなく精度を向上させます。訓練時の変換は変換後の重みを生成し、それを推論に使用しますがモデル構造を変更しません。

ABSTRACT

We propose Re-parameterized Refocusing Convolution (RefConv) as a replacement for regular convolutional layers, which is a plug-and-play module to improve the performance without any inference costs. Specifically, given a pre-trained model, RefConv applies a trainable Refocusing Transformation to the basis kernels inherited from the pre-trained model to establish connections among the parameters. For example, a depth-wise RefConv can relate the parameters of a specific channel of convolution kernel to the parameters of the other kernel, i.e., make them refocus on the other parts of the model they have never attended to, rather than focus on the input features only. From another perspective, RefConv augments the priors of existing model structures by utilizing the representations encoded in the pre-trained parameters as the priors and refocusing on them to learn novel representations, thus further enhancing the representational capacity of the pre-trained model. Experimental results validated that RefConv can improve multiple CNN-based models by a clear margin on image classification (up to 1.47% higher top-1 accuracy on ImageNet), object detection and semantic segmentation without introducing any extra inference costs or altering the original model structure. Further studies demonstrated that RefConv can reduce the redundancy of channels and smooth the loss landscape, which explains its effectiveness.

研究の動機と目的

リフォーカシング変換によってカーネルパラメータ間の結びつきを確立し、既存のCNN構造の事前情報を増強する。
推論時のアーキテクチャやコストを変更せず、事前学習モデルの表現能力を向上させる。
画像分類、物体検出、セマンティックセグメンテーションの各分野で手法の有効性を実証する。
RefConvがチャネル冗長性と損失表現の風景に与える影響を分析し、性能向上の理由を説明する。

提案手法

通常の畳み込み層をRefConvに置き換え、事前学習モデルから継承した基底重みWbを凍結し、変換後の重みWtを生成するRefocusing Transformation Tを学習する。
Wt = T(Wb, Wr) を定義し、Wrは訓練可能なリフォーカシングパラメータで、Wtを推論に使用する。
深さ方向性（depth-wise）の場合は密なRefocusing Transformationを、他の畳み込みタイプには一般化されたグループ化バージョンを用いて、クロスチャネルの結合を確立する。
基底重みの増分を学習するための同一写像を追加する、すなわちWt = Wb * T(Wb, Wr) + Wb。
Wbを凍結した状態でRefocusing Learningを実施しWrを訓練し、推論用に変換後の重みを保存して、推論グラフをベースラインと同一に保つ。
RefConvをグループごとの畳み込みや密な畳み込みへ一般化し、ハイパーパラメータGを用いてRefocusing Transformationのグループを制御し、クロスチャネル結合とパラメータ効率のバランスを取る。
Wtを推論に使用し、構造が変わらないためRefConvは訓練時コストはほとんどなく、推論コストも発生しないことを報告する。

実験結果

リサーチクエスチョン

RQ1既存のカーネル構造の事前情報を増強することで、推論コストを追加せずにCNNの性能を向上させることができるか？
RQ2リフォーカシング変換は、事前学習済みカーネルのチャネルごとの冗長性およびチャネル間相互作用にどのように影響するか？
RQ3RefConv強化モデルはImageNet分類や物体検出、セマンティックセグメンテーションなどの下流タスクで性能を向上させるか？
RQ4従来の再訓練やファインチューニングと比較して、Refocusing Learningの訓練ダイナミクスと損失風景への影響はどうなるか？

主な発見

RefConvはさまざまなバックボーンで明確な精度向上をもたらす（例：ImageNetのtop-1でMobileNetv3-Sは最大1.47%、ShuffleNetv2およびFasterNet-Sで顕著な向上）。
推論時のパラメータとFLOPsは、推論のための変換後の重みに変換した後もベースラインと変わらない。
RefConvはカーネルチャネル間のKLダイバージェンスを高めることによりチャネル冗長性を低減し、より多様な表現を示す。
RefConvを用いた訓練は損失風景を滑らかにし、より広く、まばらな等高線を生み出し、より良い一般化をもたらす可能性がある。
アブレーション研究は、事前学習済み基底重みWbが重要な事前知識であることを示し、Wrをゼロ初期化しても性能が向上する可能性があるが、標準的なランダム初期化が最も良い性能を示す。
RefConvの強化は物体検出（Pascal VOC SSD）およびセマンティックセグメンテーション（Cityscapes DeepLabv3+）へも転移し、mAP/mIoUをベースラインより向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。