QUICK REVIEW

[論文レビュー] InceptionNeXt: When Inception Meets ConvNeXt

Weihao Yu, Pan Zhou|arXiv (Cornell University)|Mar 29, 2023

Advanced Neural Network Applications被引用数 17

ひとこと要約

InceptionNeXt は large-kernel depthwise convolution を 4つの並列ブランチ（アイデンティティブランチを含む）に分解することで、精度を維持または向上させつつ速度を向上させ、ConvNeXt と比較してより高速なトレーニング/推論と ImageNet および ADE20K での強力な結果を実現する。

ABSTRACT

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXt-T has similar FLOPs with ResNet-50 but only achieves ~60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which poses a challenging problem: How to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e., small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint. Code is available at https://github.com/sail-sg/inceptionnext.

研究の動機と目的

視覚モデルで高い精度を保持しつつ、大きなカーネルのCNN をより高速化する動機付け。
Inception に触発された効率的な depthwise 畳み込み演算子を導入し、メモリアクセスコストを削減する。
CNN設計の経済的なベースラインとして InceptionNeXt ファミリのモデルを開発する。

提案手法

チャネルを分割し、4つの並列ブランチ（小さな 3x3 正方形、水平バンド 1xk、垂直バンド kx1、アイデンティティ）で処理する Inception depthwise convolution を導入する。
大きなカーネルの depthwise conv を形式的に4つのブランチに分解し、出力を結合して特徴マップを形成する。
Inception depthwise モジュールを MetaNeXt/ConvNeXt ライクなブロックに組み込み、InceptionNeXt バックボーンを作成する。
性能と速度を考慮したチャネル次元スケーリングとMLP比を用いた4段階アーキテクチャを設定する。
ブランチの重要性、バンドカーネルサイズ、畳み込みブランチ比を調べるアブレーションを提供する。

実験結果

リサーチクエスチョン

RQ1大きなカーネルを持つ depthwise 畳み込みを、CNN の精度を犠牲にすることなく効率化できるか？
RQ2Inceptionスタイルの depthwise 分解は、標準の ConvNeXt ライクブロックよりも優れた速度-精度のトレードオフを提供するか？
RQ3InceptionNeXt バックボーンは ImageNet-1K および ADE20K で最先端の ViT/CNN ハイブリッドと競合できるか？

主な発見

モデル	Params (M)	MACs (G)	トレーニングスループット (imgs/s)	推論スループット (imgs/s)	Top-1 (%)	ノート
InceptionNeXt-T (Ours)	28	4.2	901	2900	82.3	アブレーション研究のベースライン; Table 4 による ConvNeXt-T より +0.2。
InceptionNeXt-S (Ours)	49	8.4	521	1750	83.5	Higher throughput and accuracy vs ConvNeXt-S.
InceptionNeXt-B (Ours)	87	14.9	375	1244	84.0	Best trade-off among tested sizes; +0.2 over ConvNeXt-B.

InceptionNeXt-T は ConvNeXt-T より top-1 精度を0.2% 高く、A100 GPU 上でトレーニングスループットを1.6x、推論スループットを1.2x 提供する。
サイズを問わず、InceptionNeXt は精度で ConvNeXt を一貫して上回るか同等、トレーニングの顕著なスピードアップと競争力のあるまたはより良いスループットを示す。
ImageNet-1K では、InceptionNeXt-S および InceptionNeXt-B は等方 ConvNeXt の対応モデルより top-1 精度が最大約0.4%向上し、スループット上の有意な優位性を持つ（例: InceptionNeXt-B：84.0% Top-1 対 ConvNeXt-B 83.8%）。
アブレーション結果は、水平/垂直バンドブランチのいずれか、または小さな 3x3 ブランチを削除すると精度が低下する一方で、並列のバンドブランチが速度と精度のバランスを提供することを示す。
ADE20K のセマンティックセグメンテーションでは、InceptionNeXt バックボーンがモデルサイズを問わず Swin および ConvNeXt を上回る高い mIoU を達成（例: InceptionNeXt-B: 46.4 mIoU with Semantic FPN）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。