QUICK REVIEW

[論文レビュー] PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jungwook Choi, Zhuo Wang|arXiv (Cornell University)|May 16, 2018

Model Reduction and Neural Networks参考文献 19被引用数 719

ひとこと要約

PACTは訓練中に量子化を有効にする学習可能なクリッピングパラメータαを導入し、4ビットの重みと活性化をほぼ全精度に近い精度で実現し、ハードウェア効率を可能にする。

ABSTRACT

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $α$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

研究の動機と目的

訓練中の活性化の量子化を通じてCNNの計算量とストレージコストを削減する動機付け。
量子化スケールを最適化するために学習可能な活性化クリッピングパラメータαを導入する。
4ビット量子化ネットワークが複数のモデル/データセットでほぼ全精度に近づくことを示す。
精度低減によるハードウェアへの影響とシステムレベルの潜在的な性能向上を分析する。

提案手法

ReLUをαのクリップ値を持つパラメータ化クリッピング活性化関数であるPACTに置換する。
クリップ後に線形量子化を用いてクリップされた活性化yをkビットに量子化する。
勾配のためのストレートスルー推定器を用いてバックプロパゲーションによりαを学習する。
小さい活性化レンジを促進し量子化誤差を減らすためにL2項でαを正則化する。
ハードウェアの複雑さを軽減し最終出力スケーリングを単純化するために層ごとにαを共有する。

実験結果

リサーチクエスチョン

RQ1学習可能なクリッピングパラメータで量子化された活性化は非常に低いビット数で精度を維持できるか？
RQ2訓練中のαの最適化は、固定値/クリップされた活性化よりも良い量子化スケールをもたらしますか？
RQ3様々なCNNアーキテクチャとデータセットでPACTを使用した場合の精度とハードウェアのトレードオフは何ですか？
RQ4重みと活性化の4ビット量子化は実質的な精度低下なしに実現可能ですか？

主な発見

PACTは学習可能なクリッピングパラメータを備えた活性化の量子化を可能にし、精度を維持する。
4ビット量子化CNNは、複数のアーキテクチャとデータセットで全精度ネットワークと同等の精度を達成する。
PACTはAlexNet, ResNet18, ResNet50における低ビット精度での精度低下の点で従来の量子化方式を上回る。
PACTを用いた重みと活性化の4ビット同時量子化は、試験されたネットワーク全体でほぼ全精度性能をもたらす。
システムレベルの分析は、低精度を使用した場合にハードウェア面積の大幅な削減と、帯域幅制約下のハードウェアで潜在的な超線形の性能向上を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。