QUICK REVIEW

[论文解读] Post-training 4-bit quantization of convolution networks for rapid-deployment

Ron Banner, Yury Nahshan|arXiv (Cornell University)|Oct 2, 2018

Advanced Neural Network Applications参考文献 16被引用 125

一句话总结

提出一种实用的无再训练的 CNN 4 位后训练量化框架，使用 Analytical Clipping for Integer Quantization (ACIQ)、逐通道比特分配和偏置校正来保持准确性。

ABSTRACT

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of intermediate results, but it often requires the full datasets and time-consuming fine tuning to recover the accuracy lost after quantization. This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset. We target the quantization of both activations and weights and suggest three complementary methods for minimizing quantization error at the tensor level, two of whom obtain a closed-form analytical solution. Combining these methods, our approach achieves accuracy that is just a few percents less the state-of-the-art baseline across a wide range of convolutional models. The source code to replicate all experiments is available on GitHub: \url{https://github.com/submission2019/cnn-quantization}.

研究动机与目标

在没有完整训练数据的情况下，推动低位量化 CNN 的快速部署。
通过分析裁剪并考虑通道相关的比特宽度选择，将量化误差降至最小化。
实现无重新训练的情况下，对激活和权值进行 4 位精度量化。
提供偏置校正以减轻量化对权值的偏置影响。

提出的方法

引入 ACIQ：对激活进行分析地确定裁剪阈值，以最小化均方误差。
提出逐通道比特分配，在固定的平均比特预算下为每个通道分配最优比特宽度。
应用偏置校正，以补偿权重中的量化偏差。
在一个联合部署流程中，使用所提方法对权重和激活进行量化。
采用带融合 ReLU 的逐通道量化方案以降低噪声。
证明将这些方法结合使用可在不进行微调的情况下恢复大部分性能下降。

实验结果

研究问题

RQ14 位后训练量化方法在不使用完整训练数据的情况下，能否接近 CNN 的浮点精度？
RQ2分析裁剪、逐通道比特分配和偏置校正分别及共同作用对 4 位量化的准确度有多大影响？
RQ3在常见 CNN 架构中，是否可行将权重和激活都量化到 4 位且损失在可接受范围？
RQ4应用这些后训练技术在部署速度与内存方面的实际提升如何？

主要发现

ACIQ 和权重偏置校正分别在 4 位基线上平均提升约 3.2% 和 6.0%。
逐通道比特分配在激活量化方面提升约 2.85%（权重量化约 6.3%）。
将三者结合应用于权重与激活，可在不重新训练的情况下恢复大部分降级的准确性。
在六个 ImageNet 模型上，4 位后训练量化实现的准确性接近最先进的基线且需极少的重新训练，从而实现快速部署。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。