QUICK REVIEW

[论文解读] HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Zhen Dong, Zhewei Yao|arXiv (Cornell University)|Nov 10, 2019

Advanced Neural Network Applications参考文献 25被引用 53

一句话总结

HAWQ-V2 通过使用 Hessian trace（特征值的平均值）来衡量层敏感性，扩展基于 Hessian 的混合精度量化，自动通过帕累托前沿选择层的比特精度，并启用激活量化；在无需手动设置的情况下实现了最先进的结果。

ABSTRACT

Quantization is an effective method for reducing memory footprint and inference time of Neural Networks, e.g., for efficient inference in the cloud, especially at the edge. However, ultra low precision quantization could lead to significant degradation in model generalization. A promising method to address this is to perform mixed-precision quantization, where more sensitive layers are kept at higher precision. However, the search space for a mixed-precision quantization is exponential in the number of layers. Recent work has proposed HAWQ, a novel Hessian based framework, with the aim of reducing this exponential search space by using second-order information. While promising, this prior work has three major limitations: (i) HAWQV1 only uses the top Hessian eigenvalue as a measure of sensitivity and do not consider the rest of the Hessian spectrum; (ii) HAWQV1 approach only provides relative sensitivity of different layers and therefore requires a manual selection of the mixed-precision setting; and (iii) HAWQV1 does not consider mixed-precision activation quantization. Here, we present HAWQV2 which addresses these shortcomings. For (i), we perform a theoretical analysis showing that a better sensitivity metric is to compute the average of all of the Hessian eigenvalues. For (ii), we develop a Pareto frontier based method for selecting the exact bit precision of different layers without any manual selection. For (iii), we extend the Hessian analysis to mixed-precision activation quantization. We have found this to be very beneficial for object detection. We show that HAWQV2 achieves new state-of-the-art results for a wide range of tasks.

研究动机与目标

通过量化来减少内存和计算量，同时尽量保持泛化能力。
通过利用完整的 Hessian 谱而不仅仅是最大特征值来改进混合精度量化。
在不进行手动调参的情况下自动选择确切的每层比特精度。
将基于 Hessian 的分析扩展到激活量化。
在 ImageNet 和 COCO 任务上展示最先进的量化性能。

提出的方法

将迹（Hessian 特征值的平均值）作为敏感性度量来指导每层的精度，而不是最大特征值。
应用 Hutchinson 的随机化算法在不形成完整 Hessian 的情况下高效估计 Hessian 路迹。
引入基于帕累托前沿的方法在简化的搜索空间中自动选择每层的确切比特精度。
通过分析权重的 Hessian 及其对激活的影响并采用无矩阵形式的迹估计方法，将框架扩展到混合精度激活量化。
对网络（Inception-V3、ResNet-50、SqueezeNext）进行量化，并在 ImageNet 上评估；在 COCO 上测试以 ResNet-50 为骨干网络的 RetinaNet。

实验结果

研究问题

RQ1Hessian trace 是否比最大 Hessian 特征值更能捕捉层的敏感性，从而用于量化决策？
RQ2基于帕累托前沿的自动化每层比特精度选择是否达到或超过手动选定的设置？
RQ3是否可在不牺牲实际可行性的前提下高效计算权重与激活的 Hessian traces，以实现混合精度量化？
RQ4混合精度激活量化是否能提升性能，特别是在目标检测任务中？
RQ5与此前的量化方法相比，HAWQ-V2 在标准基准（ImageNet、COCO）上的表现如何？

主要发现

平均 Hessian trace 提供比最大特征值更好的层量化敏感性度量。
Hessians traces 可通过 Hutchinson 的算法高效估计（例如在 4 个 GPU 上对 ResNet50 的 54 层约 ~30 分钟）。
帕累托前沿方法实现了自动选择每层确切比特精度的能力，无需手动调参（示例将搜索空间从指数级减小）。
HAWQ-V2 在 ImageNet 上实现了对 Inception-V3（75.68% Top-1，7.57 MB）、ResNet-50（75.76%，7.99 MB）和 SqueezeNext（68.38%，1.07 MB）的最先进结果。
在 COCO RetinaNet（以 ResNet-50 骨干）上，HAWQ-V2 在激活量化的条件下达到 34.4 mAP，且模型大小为 17.90 MB，优于直接量化和 FQN 的边际提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。