QUICK REVIEW

[Paper Review] PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jungwook Choi, Zhuo Wang|arXiv (Cornell University)|May 16, 2018

Model Reduction and Neural Networks19 references719 citations

TL;DR

PACT introduces a learnable clipping parameter alpha to activate quantization during training, enabling 4-bit weights and activations with near full-precision accuracy and enabling hardware efficiency.

ABSTRACT

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $α$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

Motivation & Objective

Motivate reducing CNN computation and storage costs via quantization of activations during training.
Introduce a learnable activation clipping parameter α to optimize quantization scales.
Demonstrate that 4-bit quantized networks can approach full-precision accuracy across multiple models/datasets.
Analyze hardware implications and potential system-level performance gains from reduced precision.

Proposed method

Replace ReLU with PACT, a parameterized clipping activation with clip value α.
Quantize the clipped activation y to k bits using a linear quantization after clipping.
Learn α via back-propagation with a straight-through estimator for the gradient.
Regularize α with an L2 term to encourage smaller activation ranges and reduce quantization error.
Share α per layer to reduce hardware complexity and simplify the final output scaling.

Experimental results

Research questions

RQ1Can activations quantized with a learnable clipping parameter maintain accuracy at very low bit-precision?
RQ2Does optimizing α during training yield better quantization scales than fixed/clipped activations?
RQ3What are the accuracy and hardware trade-offs when using PACT across various CNN architectures and datasets?
RQ4Is 4-bit quantization of both weights and activations viable without substantial accuracy loss?

Key findings

PACT enables activation quantization with a learnable clipping parameter that preserves accuracy.
4-bit quantized CNNs with PACT achieve accuracies similar to full-precision networks on multiple architectures and datasets.
PACT outperforms prior quantization schemes at low bit-precision in terms of accuracy degradation for AlexNet, ResNet18, and ResNet50.
Joint quantization of weights and activations at 4 bits using PACT yields near full-precision performance across tested networks.
System-level analysis shows substantial hardware-area reductions and potential super-linear performance gains in bandwidth-constrained hardware when using reduced precision.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.