QUICK REVIEW

[論文レビュー] Incremental Network Quantization: Towards Lossless CNNs with\n Low-Precision Weights

Aojun Zhou, Anbang Yao|arXiv (Cornell University)|Feb 9, 2017

Advanced Neural Network Applications被引用数 591

ひとこと要約

この論文は Incremental Network Quantization (INQ) を紹介します。これは事前学習済みの高精度 CNN を、重みが 2のべき乗またはゼロになる低精度モデルへ変換する手法で、重みの分割、グループ単位の量子化、反復的で損失ゼロの再訓練を用います。複数のアーキテクチャにおいて ImageNet で 5 bit、4 bit、さらには 3 bit の量子化で精度を達成または改善します。

ABSTRACT

This paper presents incremental network quantization (INQ), a novel method,\ntargeting to efficiently convert any pre-trained full-precision convolutional\nneural network (CNN) model into a low-precision version whose weights are\nconstrained to be either powers of two or zero. Unlike existing methods which\nare struggled in noticeable accuracy loss, our INQ has the potential to resolve\nthis issue, as benefiting from two innovations. On one hand, we introduce three\ninterdependent operations, namely weight partition, group-wise quantization and\nre-training. A well-proven measure is employed to divide the weights in each\nlayer of a pre-trained CNN model into two disjoint groups. The weights in the\nfirst group are responsible to form a low-precision base, thus they are\nquantized by a variable-length encoding method. The weights in the other group\nare responsible to compensate for the accuracy loss from the quantization, thus\nthey are the ones to be re-trained. On the other hand, these three operations\nare repeated on the latest re-trained group in an iterative manner until all\nthe weights are converted into low-precision ones, acting as an incremental\nnetwork quantization and accuracy enhancement procedure. Extensive experiments\non the ImageNet classification task using almost all known deep CNN\narchitectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the\nefficacy of the proposed method. Specifically, at 5-bit quantization, our\nmodels have improved accuracy than the 32-bit floating-point references. Taking\nResNet-18 as an example, we further show that our quantized models with 4-bit,\n3-bit and 2-bit ternary weights have improved or very similar accuracy against\nits 32-bit floating-point baseline. Besides, impressive results with the\ncombination of network pruning and INQ are also reported. The code is available\nat https://github.com/Zhouaojun/Incremental-Network-Quantization.\n

研究の動機と目的

低精度 CNN 量子化における精度低下や収束の遅さを動機づけ、対処する。
損失ゼロのインクリメンタル量子化フレームワークを提案し、フル精度CNNを低精度ウェイトへ変換。
ImageNet の主要なアーキテクチャで有効性を実証。
INQ とネットワークプルーニングの組み合わせによる圧縮の利点を探る。
INQ の実用的なビット幅の限界と収束挙動を示す。

提案手法

重みを低精度ベースと再訓練可能な補償グループに分ける重み分割を導入する。
可変長エンコードを用いたグループ単位量子化でベースウェイトを2のべき乗またはゼロへ量子化する。
ベースウェイトを固定したまま補償グループを再訓練して精度を回復する。
3つの操作 (分割、量子化、再訓練) をすべての重みが量子化されるまで反復する。
制約付き最適化を用いる: L(W) + λR(W) を最小化し、W(i,j) ∈ P_l を満たすようにし、SGD 更新は非量子化ウェイトのみに影響させる。
参照方程式には、P_l への写像の重み量子化則 (4)、n1/n2 の決定 (2,3)、マスク付き SGD 更新 (8) が含まれます。

実験結果

リサーチクエスチョン

RQ1INQ を用いて、フル精度 CNN を精度低下なく低精度ウェイトへ量子化できるか？
RQ2重み分割戦略は最終的な精度と収束にどう影響するか？
RQ3大規模データセットで損失ゼロまたはほぼゼロの量子化を許す実現可能なビット幅は何か？
RQ4ImageNet の CNN におけるプルーニングや他の圧縮技術と INQ の相互作用はどうなるか？

主な発見

AlexNet、VGG-16、GoogleNet、ResNet-18、ResNet-50 に対する 5-bit INQ は、対応するフル精度ベースラインに対して一貫して top-1 / top-5 の改善をもたらす（top-1 増分 0.13%–2.28%、top-5 増分 0.23%–1.65%）。
INQ は収束の容易さを示し、通常は各反復で 8 エポック未満の再訓練で損失ゼロの 5-bit 量子化を達成します。
ResNet-18 の 4-bit、3-bit、2-bit テルナリウェイトは 32-bit ベースラインと同等またはそれを上回る精度を達成（4-bit/3-bit は非常に近い、2-bit テルナリはベースラインより劣るが、従来の二値/三値モデルよりは良い）。
Pruning+INQ は AlexNet で Han ら (2016) の Deep Compression を上回り、圧縮率を維持・向上させる（例: 5-bit INQ+DNS は 53x 対 prior の 27x/35x）。
ベクトル量子化のみと比較して、INQ は精度保持が優れ（5-bit/4-bit 量子化）、全層を量子化し、全結合層だけでなく。
INQ は大幅な圧縮を実現し、精度を維持・向上させ、リソース制約のあるデバイスでの実用展開を可能にします。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。