QUICK REVIEW

[论文解读] Training with Quantization Noise for Extreme Model Compression

Angela Fan, Pierre Stock|arXiv (Cornell University)|Apr 15, 2020

Advanced Neural Network Applications参考文献 72被引用 114

一句话总结

本文介绍了 Quant-Noise，一种正则化技术，在训练时仅对权重的随机子集进行量化，以构建对极端量化（例如 int4/int8 和乘积量化）具有鲁棒性的模型。这在高压缩下实现了最先进的准确度，包括 NLP 和图像任务。在

ABSTRACT

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14MB and 80.0 top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3MB.

研究动机与目标

在不造成较大精度损失的情况下，推动极端模型压缩。
开发一种训练时机制，使网络对各种量化方案具有鲁棒性。
支持标量量化、乘积量化（PQ/iPQ）和定点运算的组合。
研究 Quant-Noise 作为后处理步骤，在无需完全重新训练的情况下提升量化模型的性能。

提出的方法

Quant-Noise 在每次前向传播时选择权重块的随机子集，并施加模仿目标量化的失真。
失真函数包括定点标量量化和乘积量化（对于 PQ/iPQ 使用代理噪声）。
反向传播在失真权重上使用直通估计（STE）来计算梯度，而未失真的块获得无偏梯度。
Quant-Noise 可以与剪枝或层丢弃结合，以在训练期间模拟剪枝和结构化稀疏性。
使用 PQ 时，可以通过一个代理将选定的子向量置零来实现噪声，鼓励有用的子向量相关性。

实验结果

研究问题

RQ1在没有较大精度损失的情况下，通过带有随机量化噪声的训练是否能获得对极端量化（int4/int8、PQ/iPQ）具有鲁棒性的模型？
RQ2在极端压缩条件下，Quant-Noise 是否相对于标准 QAT 提升准确性？
RQ3Quant-Noise 是否能够对已训练好的模型实现有效的后训练量化改进？
RQ4在 NLP 与视觉任务中，PQ/iPQ 与定点量化和剪枝的最佳组合是什么？

主要发现

Quant-Noise 在 NLP（基于 RoBERTa）和视觉任务（EfficientNet-B3）上对多种量化方案（int4、int8、PQ/iPQ）均提升了性能。
在 NLP（MNLI 与 RoBERTa），RoBERTa 压缩到 14 MB 时在没有 Quant-Noise 下达到 82.5% 的准确率，训练期间使用 Quant-Noise 时为 83.6%；使用 Quant-Noise 的后训练微调也达到 83.6%。
在 ImageNet 的 EfficientNet-B3 上，压缩到 3.3 MB 时，在 Quant-Noise 下达到 80.0% 的 top-1 准确率，而压缩后未量化的基线为 78.5%。
iPQ + Quant-Noise 在 ImageNet 上 3.3 MB 达到 80.0% 的 top-1；与 int8 和 Quant-Noise 结合时为 79.8%；基于 PQ 的方法在实现强压缩的同时仅造成极小的精度损失。
Quant-Noise 能实现极端压缩比（例如在 NLP 中通过剪枝和共享达到 ×94），同时保持与未压缩模型相比具有竞争力的困惑度/准确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。