[Paper Review] Learned Step Size Quantization
Introduces Learned Step Size Quantization (LSQ) to train low-precision networks (2–4 bits) with activations and weights, achieving state-of-the-art ImageNet accuracy and enabling 3-bit models to reach full-precision performance.
Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the ImageNet dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3- or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Our approach builds upon existing methods for learning weights in quantized networks by improving how the quantizer itself is configured. Specifically, we introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. This approach works using different levels of precision as needed for a given system and requires only a simple modification of existing training code.
Motivation & Objective
- Motivate reducing precision in deep networks to improve throughput and energy efficiency without sacrificing accuracy.
- Develop a quantization method where the step size is learned as a model parameter to minimize task loss.
- Ensure compatibility with existing backpropagation and SGD training pipelines.
- Demonstrate LSQ across multiple architectures on ImageNet and compare with prior quantization methods.
Proposed method
- Define a quantizer that maps real values to discrete levels using a step size s and clipping bounds set by QP and QN.
- Introduce a gradient for s via straight-through estimation that accounts for transitions between quantized states (Equation 3).
- Scale the step-size gradient by a per-layer factor g to balance updates with weight/activation updates (Equation 4).
- Train with full-precision weights, quantized weights/activations during forward/backward passes, and use cosine learning rate decay.
- Initialize LSQ step sizes per layer from data statistics and fine-tune from a pre-trained full-precision model.
Experimental results
Research questions
- RQ1Can learning the quantizer step size via gradient-based optimization improve task performance over fixed or error-minimizing quantizers?
- RQ2Does LSQ enable 2-, 3-, and 4-bit networks to approach or match full-precision accuracy on ImageNet across architectures?
- RQ3What is the impact of step-size gradient scaling on convergence and final accuracy?
- RQ4Is quantization error minimization necessary for high task performance, or can alternative objectives yield better results?
Key findings
- LSQ achieves higher top-1/top-5 accuracy than prior 2-, 3-, and 4-bit methods across multiple architectures on ImageNet.
- 3-bit networks trained with LSQ reach or closely approach full-precision accuracy in several cases.
- A per-layer step-size gradient scale improves convergence and balances updates with weight/activation gradients.
- LSQ does not minimize quantization error best, yet yields superior task performance compared to quantization-error minimization approaches.
- Knowledge distillation with LSQ further boosts accuracy, with some 3-bit models matching full-precision baselines.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.