QUICK REVIEW

[論文レビュー] Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Gaurav Menghani|arXiv (Cornell University)|Jun 16, 2021

Advanced Neural Network Applications被引用数 37

ひとこと要約

ディープラーニングモデルを小さく、速く、より良くする方法に関する、モデリング手法、インフラ、ハードウェアを横断した包括的な調査。剪定、量子化、学習戦略を含む。

ABSTRACT

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

研究の動機と目的

精度と透明性を超える深層学習における効率性の重要性を動機づける。モデルのフットプリント指標（サイズ、レイテンシ、トレーニングコスト）を重視。
モデリング手法、インフラ、ハードウェアにまたがる効率化技術の網羅的な分類法を提供する。
パレート最適なモデルを訓練・デプロイするための実用的な指針とコード付きの実験ロードマップを強調する。
現実世界の効率的なAIアプリケーションを実現するために、モデリング技術とデプロイメントの考慮事項を橋渡しする。

提案手法

効率化技術の五領域のメンタルモデルを提示する：圧縮、学習、自動化、効率的なアーキテクチャ、インフラストラクチャ。
剪定と顕著性ベース剪定戦略の詳細。構造化剪定と非構造化剪定、スパース性スケジューリングを含む。
重みと活性化の量子化およびデ量子化アルゴリズムを含む、量子化および量子化対応訓練を説明する。
低ランク因子分解や重み共有など、他の圧縮技術について論じる。
蒸留やアンサンブルベースのアプローチなど、より小さなモデルで同等の性能を達成する学習技術を説明する。
スパースおよび量子化モデルのハードウェア加速実装に関する経験的観点を提供する。

実験結果

リサーチクエスチョン

RQ1ディープラーニングモデルにおいて、モデルの精度とフットプリント（サイズ/レイテンシ）のパレート最適なトレードオフを可能にする技術は何か？
RQ2圧縮と学習手法は、実際のハードウェアでの効果と実用的なデプロイメントの観点でどう比較されるか？
RQ3訓練とデプロイで効率向上を実現するために、どのようなインフラとツールが必要か？
RQ4構造化剪定と非構造化剪定および量子化アプローチは、現実世界のレイテンシとサイズの改善にどのように寄与するか？

主な発見

モデルアーキテクチャ	スパース性タイプ	スパース率(%)	FLOPs	Top-1 正確度 %	出典
MobileNet v2 - 1.0	Dense (Baseline)	0%	1x	72.0%	Sandler et al. (2018)
MobileNet v2 - 1.0	Unstructured	75%	0.27x	67.7%	Zhu and Gupta (2018)
MobileNet v2 - 1.0	Unstructured	75%	0.52x	71.9%	Evci et al. (2020)
MobileNet v2 - 1.0	Structured (block-wise)	85%	0.11x	69.7%	Elsen et al. (2020); google research (2021)
MobileNet v2 - 1.0	Unstructured	90%	0.12x	61.8%	Zhu and Gupta (2018)
MobileNet v2 - 1.0	Unstructured	90%	0.12x	69.7%	Evci et al. (2020)

剪定はパラメータを大幅に削減でき、構造化剪定を用いると、推論の高速化とサイズ削減に意味のある効果をもたらす。
量子化は8ビット重みによりモデルサイズを約4倍縮小でき、量子化対応訓練はしばしば事後量子化よりも精度を保つ。
活性化量子化と固定小数点実行は、SIMD対応のCPUで顕著な速度向上をもたらし、固定小数点グラフで最大約3xの推論速度向上を得られることがある。
量子化対応訓練は精度で事後訓練量子化を上回りながら、依然として顕著なサイズ削減を提供する。
互換カーネルを備えた構造化スパース表現は、特定のハードウェアでパラメータ数を減らした状態でも密なモデルを上回ることがある。
Lottery Ticket仮説は大規模ネットワーク内のコンパクトなサブネットワークの存在を動機づけるが、データセットとアーキテクチャにより結果は異なる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。