QUICK REVIEW

[論文レビュー] Model compression via distillation and quantization

Antonio Polino, Razvan Pascanu|arXiv (Cornell University)|Feb 15, 2018

Advanced Neural Network Applications参考文献 26被引用数 262

ひとこと要約

本論文は、full‑precisionの教師モデルから浅い、量子化された学生へ蒸留することで深層ネットワークを圧縮する2つの手法—quantized distillationと differentiable quantization—を提案し、視覚および言語タスク全体で高い精度の保持と大幅な圧縮を実現します。

ABSTRACT

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks. The first method we propose is called quantized distillation and leverages distillation during the training process, by incorporating distillation loss, expressed with respect to the teacher, into the training of a student network whose weights are quantized to a limited set of levels. The second method, differentiable quantization, optimizes the location of quantization points through stochastic gradient descent, to better fit the behavior of the teacher model. We validate both methods through experiments on convolutional and recurrent architectures. We show that quantized shallow students can reach similar accuracy levels to full-precision teacher models, while providing order of magnitude compression, and inference speedup that is linear in the depth reduction. In sum, our results enable DNNs for resource-constrained environments to leverage architecture and accuracy advances developed on more powerful devices.

研究の動機と目的

高精度なfull-precisionの教師モデルを活用して圧縮された学生モデルを改善する。
蒸留と重みの量子化を組み合わせて、深さと幅の同時削減を実現する。
CNN、RNN、および翻訳タスク全体で手法を検証し、一般性と実用的な利点を示す。
標準ベンチマークで精度を保ちつつ、圧縮とスピードアップを定量化する。）

提案手法

スケーリングとバケティングを用いた重み量子化を定義し、uniformおよびnon-uniformスキームの双方を含む。
quantized weightsを用いた蒸留損失で訓練された学生が学習する、quantized distillationを導入する。
量子化関数を介してバックプロップすることで量子化点pをSGDで学習する、微分可能な量子化を開発する。
これらの手法をCNN（例: ResNet系列）、Wide ResNets、OpenNMTのLSTM、およびWMT翻訳設定に適用する。
バケツ化およびHuffman符号化表現を含む、圧縮効果・ストレージ・推論速度の向上を分析する。

実験結果

リサーチクエスチョン

RQ1蒸留と量子化を組み合わせることで、リソース制約のある環境に適した高精度で圧縮されたモデルを得られるか？
RQ2視覚と言語タスク全体で、quantized distillationとdifferentiable quantizationが精度・収束・効率の点でどのように比較されるか？
RQ3ビット幅、バケットサイズ、アーキテクチャが圧縮と精度のトレードオフに与える影響は？
RQ4量子化モデルを訓練する際、蒸留損失は標準損失より優れているか？
RQ5これらの手法は大規模データセットとアーキテクチャ（例：ImageNet、WMT）にどの程度スケールするか？

主な発見

量子化された浅い学生はfull-precisionの教師の精度に近づくことができ、圧縮率は最大で1桁オーダーに達する。
quantized distillationは後処理量子化(post-mortem quantization)および微分可能量子化よりしばしば高精度を発揮する。
ImageNetでは、4-bit量子化、蒸留済みの2xResNet18は、教師となるResNet34と同等の精度を達成しつつ、より小さく高速である。
CIFAR-10では、4ビットで微分可能量子化とquantized distillationがほぼ教師並みの精度を実現し、蒸留損失を用いるとさらに大きな利得が得られる。
OpenNMTとWMTの実験は、蒸留がサイズを小さくしてもBLEUと困惑度を教師レベルに近づけるのに役立つことを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。