[Paper Review] HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
HAWQ-V2 extends Hessian-based mixed-precision quantization by using Hessian trace (average eigenvalues) for layer sensitivity, automatically selecting layer bit-precision via Pareto frontier, and enabling activation quantization; achieves state-of-the-art results without manual settings.
Quantization is an effective method for reducing memory footprint and inference time of Neural Networks, e.g., for efficient inference in the cloud, especially at the edge. However, ultra low precision quantization could lead to significant degradation in model generalization. A promising method to address this is to perform mixed-precision quantization, where more sensitive layers are kept at higher precision. However, the search space for a mixed-precision quantization is exponential in the number of layers. Recent work has proposed HAWQ, a novel Hessian based framework, with the aim of reducing this exponential search space by using second-order information. While promising, this prior work has three major limitations: (i) HAWQV1 only uses the top Hessian eigenvalue as a measure of sensitivity and do not consider the rest of the Hessian spectrum; (ii) HAWQV1 approach only provides relative sensitivity of different layers and therefore requires a manual selection of the mixed-precision setting; and (iii) HAWQV1 does not consider mixed-precision activation quantization. Here, we present HAWQV2 which addresses these shortcomings. For (i), we perform a theoretical analysis showing that a better sensitivity metric is to compute the average of all of the Hessian eigenvalues. For (ii), we develop a Pareto frontier based method for selecting the exact bit precision of different layers without any manual selection. For (iii), we extend the Hessian analysis to mixed-precision activation quantization. We have found this to be very beneficial for object detection. We show that HAWQV2 achieves new state-of-the-art results for a wide range of tasks.
Motivation & Objective
- Motivate reducing memory and computation via quantization while preserving generalization.
- Improve mixed-precision quantization by leveraging full Hessian spectrum rather than only the top eigenvalue.
- Automatically select exact per-layer bit-precision without manual tuning.
- Extend Hessian-based analysis to activation quantization.
- Demonstrate state-of-the-art quantization performance on ImageNet and COCO tasks.
Proposed method
- Use trace (average of Hessian eigenvalues) as a sensitivity metric to guide per-layer precision, instead of the top eigenvalue.
- Apply Hutchinson’s randomized algorithm to estimate Hessian traces efficiently without forming the full Hessian.
- Introduce a Pareto-frontier based method to automatically pick exact per-layer bit-precision from a reduced search space.
- Extend the framework to mixed-precision activation quantization by analyzing Hessian with respect to activations and employing a matrix-free trace estimation approach.
- Quantize networks (Inception-V3, ResNet-50, SqueezeNext) and evaluate on ImageNet, and test RetinaNet with ResNet-50 backbone on COCO.
Experimental results
Research questions
- RQ1Can Hessian trace better capture layer sensitivity than the top Hessian eigenvalue for quantization decisions?
- RQ2Does automatic Pareto-frontier based selection of per-layer bit-precision achieve or surpass manually chosen settings?
- RQ3Is it feasible to compute Hessian traces efficiently for weights and activations to enable practical mixed-precision quantization?
- RQ4Does mixed-precision activation quantization improve performance, particularly for object detection tasks?
- RQ5How does HAWQ-V2 perform on standard benchmarks (ImageNet, COCO) compared to prior quantization methods?
Key findings
- Average Hessian trace provides a better sensitivity measure than the top eigenvalue for layer quantization decisions.
- Hessians traces can be estimated efficiently with Hutchinson’s algorithm (e.g., 54 layers of ResNet50 in ~30 minutes on 4 GPUs).
- A Pareto-frontier approach enables automatic selection of exact per-layer bit-precision without manual tuning (example reduces search space from exponential).
- HAWQ-V2 achieves state-of-the-art results on ImageNet for Inception-V3 (75.68% Top-1, 7.57 MB), ResNet-50 (75.76%, 7.99 MB), and SqueezeNext (68.38%, 1.07 MB).
- On COCO RetinaNet with ResNet-50 backbone, HAWQ-V2 reaches 34.4 mAP with activation quantization and 17.90 MB, outperforming direct quantization and FQN by margins.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.