QUICK REVIEW

[论文解读] Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

Naveen Mellempudi, Abhisek Kundu|arXiv (Cornell University)|Jan 31, 2017

Advanced Neural Network Applications参考文献 12被引用 23

一句话总结

本文提出一种基于聚类的量化方法，可将预训练的全精度深度学习模型转换为低精度推理，且精度损失最小。通过将滤波器分组并共享缩放因子，实现高效的8位整数推理，在ImageNet上使用三值权重实现71.8%的Top-1准确率（较全精度模型低6%以内），使用4位权重实现76.3%的准确率，仅需85%的乘法运算量，同时实现16倍的性能提升。

ABSTRACT

We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing the number of ternary operations. We show that with a cluster size of N=4 on Resnet-101, can achieve 71.8% TOP-1 accuracy, within 6% of the best full precision results while replacing ~85% of all multiplications with 8-bit accumulations. Using the same method with 4-bit weights achieves 76.3% TOP-1 accuracy which within 2% of the full precision result. We also study the impact of the size of the cluster on both performance and accuracy, larger cluster sizes N=64 can replace ~98% of the multiplications with ternary operations but introduces significant drop in accuracy which necessitates fine tuning the parameters with retraining the network at lower precision. To address this we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights by pre-initializing the network with full precision weights and achieve 68.9% TOP-1 accuracy within 4 additional epochs. Our final quantized model can run on a full 8-bit compute pipeline, with a potential 16x improvement in performance compared to baseline full-precision models.

研究动机与目标

在不造成显著精度损失的前提下降低深度学习推理的计算成本。
通过混合低精度计算，实现完整的8位整数推理流水线。
通过动态定点聚类最小化权重和激活表示中的量化误差。
探索在低精度推理中聚类大小、性能与精度之间的权衡。
在几乎无需或无需微调预训练模型的情况下实现高精度。

提出的方法

将滤波器按大小N分组，并为每组应用共享的缩放因子，以减少量化误差。
使用动态定点表示将权重量化为三值或4位值，同时将激活值限制为8位定点表示。
对贡献于相同输出特征图的滤波器应用静态聚类，以简化卷积运算。
在聚类中用8位累加替代8位乘法，降低计算复杂度。
使用1e-4的学习率，对预训练的全精度模型进行低精度权重和8位激活的微调。
使用全精度权重初始化低精度网络，并应用批量归一化以稳定训练。

实验结果

研究问题

RQ1预训练的深度神经网络能否被量化至8位以下精度，且精度损失最小？
RQ2聚类大小如何影响计算效率与模型精度之间的权衡？
RQ3在不微调网络的情况下，能否在低精度推理中实现接近最先进水平的精度？
RQ4动态定点聚类对低精度推理中量化误差的影响如何？
RQ5能否实现完整的8位整数推理流水线，且性能损失最小？

主要发现

在ResNet-101上使用N=4的聚类大小，该方法使用三值权重实现71.8%的Top-1准确率，较全精度基线模型低6%以内。
使用相同聚类大小的4位权重，模型实现76.3%的Top-1准确率，较全精度结果低2%以内。
当N=64时，约98%的乘法运算被8位累加替代，但精度显著下降，需进行微调。
对预初始化的低精度ResNet-50模型（8位激活、2位权重）仅进行4个周期的微调，即可实现68.9%的Top-1准确率。
该方法实现了完整的8位整数推理流水线，性能和能效比最高可提升16倍。
较小的聚类（N=4）可最大化精度，而较大的聚类（N=64）在通用硬件上更注重性能，但以精度为代价。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。