QUICK REVIEW

[論文レビュー] GhostNetV3: Exploring the Training Strategies for Compact Models

Zhenhua Liu, Zhiwei Hao|arXiv (Cornell University)|Apr 17, 2024

Generative Adversarial Networks and Image Synthesis被引用数 16

ひとこと要約

GhostNetV3は、再パラメータ化、知識蒸留、学習スケジュール、データ拡張といった専門的な訓練戦略が、推論アーキテクチャを変更せずにコンパクトモデルの性能を著しく向上させることを示している。

ABSTRACT

Compact neural networks are specially designed for applications on edge devices with faster inference speed yet modest performance. However, training strategies of compact models are borrowed from that of conventional models at present, which ignores their difference in model capacity and thus may impede the performance of compact models. In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models. We find that the appropriate designs of re-parameterization and knowledge distillation are crucial for training high-performance compact models, while some commonly used data augmentations for training conventional models, such as Mixup and CutMix, lead to worse performance. Our experiments on ImageNet-1K dataset demonstrate that our specialized training strategy for compact models is applicable to various architectures, including GhostNetV2, MobileNetV2 and ShuffleNetV2. Specifically, equipped with our strategy, GhostNetV3 1.3$ imes$ achieves a top-1 accuracy of 79.1% with only 269M FLOPs and a latency of 14.46ms on mobile devices, surpassing its ordinarily trained counterpart by a large margin. Moreover, our observation can also be extended to object detection scenarios. PyTorch code and checkpoints can be found at https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv3_pytorch.

研究の動機と目的

推論アーキテクチャを固定した状態で、訓練要素がコンパクトモデルの性能に与える影響を調査する。
GhostNetV3の深さ方向畳み込みと1x1畳み込みにおいて、どの再パラメータ化手法が精度向上に最も効果的かを特定する。
知識蒸留、学習スケジュール、EMA、データ拡張が小型モデルに与える影響を探る。
コンパクトなアーキテクチャやタスク（物体検出を含む）全体に一般化可能な専門的な訓練レシピを提供する。

提案手法

深さ方向畳み込みと1x1畳み込みに線形の平行ブランチを追加して再パラメータ化を導入し、推論時にはそれらを単一の層に折りたたむ。
さまざまな教師モデル（ResNet-101、DeiT-B、BEiTV2-B）とハイパーパラメータ（alpha、tau）で知識蒸留を評価する。
学習率スケジュール（ステップ vs コサイン）とEMA設定を比較し、コンパクトモデルに対する堅牢な最適化戦略を決定する。
AutoAug、RandAug、Mixup、CutMix、RandomErasing などのデータ拡張オプションを評価し、どの拡張がコンパクトモデルに有益か有害かを判断する。
GhostNetV3（およびMobileNetV2、ShuffleNetV2などの他のコンパクトモデル）に対してImageNet-1Kで広範なアブレーション研究を実施（600エポック、バッチ2048）。
COCOでの物体検出へ知見を拡張し、訓練レシピの汎化性を検証する。

実験結果

リサーチクエスチョン

RQ1分岐数が変化するにつれて、再パラメータ化と追加された1x1深さ方向ブランチがGhostNetV3の性能に与える影響はどうなるか。
RQ2異なる教師モデルとKD設定がコンパクトモデルの精度に与える影響はどれか。
RQ3どの学習率スケジュールとEMA設定がコンパクトモデルの検証精度を最も高めるか。
RQ4どのデータ拡張戦略がコンパクトモデルに有益または有害か。
RQ5提案された訓練戦略は他のコンパクトアーキテクチャや物体検出タスクへ移行可能か。

主な発見

1x1深さ方向ブランチを伴う再パラメータ化はGhostNetV3の性能を大幅に向上させ、N=3を用いる場合の最適ブランチ数はおよそ3である。
BEiTV2-Bを教師として用いた知識蒸留はGhostNetV3の精度を向上させ、教師の品質が高いほど生徒の性能が良くなる。
コサイン学習率スケジュールが、検討したスケジュールの中で最も高いトップ1精度をもたらし、過度に大きい学習率は性能を損なう。
MixupとCutMix拡張はコンパクトモデルに有害であり、一方でランダム拡張とRandomErasingは有益である。
GhostNetV3 1.3xはImageNet-1Kで79.1% top-1を269M FLOPsで達成し、GhostNetV3 1.6xは399 MFLOPsで80.4% top-1を達成しており、精度/レイテンシの点でいくつかのコンパクトベースラインを上回る。
訓練レシピはMobileNetV2とShuffleNetV2にも一般化され、トップ1精度を顕著な向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。