QUICK REVIEW

[论文解读] Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Aojun Zhou, Anbang Yao|arXiv (Cornell University)|Feb 9, 2017

Advanced Neural Network Applications被引用 591

一句话总结

本论文提出增量网络量化（INQ），将任何预训练的全精度 CNN 转换为低精度模型，其中权重为2的幂或0，采用权重划分、分组量化，并在迭代、无损方式下进行再训练。它在 ImageNet 上对多种架构在 5、4、甚至 3 位量化下实现或提升准确率。

ABSTRACT

This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue, as benefiting from two innovations. On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. A well-proven measure is employed to divide the weights in each layer of a pre-trained CNN model into two disjoint groups. The weights in the first group are responsible to form a low-precision base, thus they are quantized by a variable-length encoding method. The weights in the other group are responsible to compensate for the accuracy loss from the quantization, thus they are the ones to be re-trained. On the other hand, these three operations are repeated on the latest re-trained group in an iterative manner until all the weights are converted into low-precision ones, acting as an incremental network quantization and accuracy enhancement procedure. Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed method. Specifically, at 5-bit quantization, our models have improved accuracy than the 32-bit floating-point references. Taking ResNet-18 as an example, we further show that our quantized models with 4-bit, 3-bit and 2-bit ternary weights have improved or very similar accuracy against its 32-bit floating-point baseline. Besides, impressive results with the combination of network pruning and INQ are also reported. The code is available at https://github.com/Zhouaojun/Incremental-Network-Quantization.

研究动机与目标

激发并解决低精度 CNN 量化中的准确率损失和收敛时间长的问题。
提出一个无损的增量量化框架，用于将全精度 CNN 转换为低精度权重。
在 ImageNet 的主流架构上证明其有效性。
探索将 INQ 与网络剪枝相结合以实现压缩的潜在益处。
展示 INQ 的实际比特宽度限制和收敛行为。

提出的方法

引入权重划分，将权重分成低精度基底和可再训练的补偿组。
应用带变量长度编码的分组量化，将基底权重量化为 2 的幂或零。
对补偿组进行再训练以恢复准确性，同时保持基底权重不变。
重复执行这三项操作（划分、量化、再训练），直到所有权重都量化。
使用受约束的优化：在量化组满足 W(i,j) ∈ P_l 的约束下，最小化 L(W) + λR(W)，并且 SGD 更新仅影响未量化的权重。
方程引用包括：将权重映射到 P_l 的量化规则（4）、n1/n2 的确定（2,3），以及屏蔽式 SGD 更新（8）。

实验结果

研究问题

RQ1是否可以使用 INQ 将全精度 CNN 量化为低精度权重而不损失准确性？
RQ2权重划分策略如何影响最终准确性和收敛性？
RQ3在大规模数据集上可实现无损或近似无损量化的可达到比特宽度有哪些？
RQ4INQ 如何与剪枝及其他 CNN 的压缩技术在 ImageNet 上的交互？

主要发现

在 AlexNet、VGG-16、GoogleNet、ResNet-18 和 ResNet-50 上的 5 位 INQ，相较于对应的全精度基线，持续取得 top-1/top-5 的提升（top-1 提升 0.13%–2.28%，top-5 提升 0.23%–1.65%）。
INQ 显示出易于收敛的特性，通常每次迭代的再训练轮数少于 8 次即可实现无损的 5 位量化。
ResNet-18 使用 4 位、3 位和 2 位三进制权重，在准确性方面达到与 32 位基线相当或更高（4 位和 3 位非常接近；2 位三进制比基线差，但优于某些以前的二进制/三进制模型）。
剪枝+INQ 在 AlexNet 上优于 Han 等人（2016）的深度压缩方法，在维持或提升准确性的同时实现更高的压缩（例如 5-bit INQ+DNS 为 53x，而前作为 27x/35x）。
与向量量化单独比较时，INQ 在 5 位/4 位量化下展现出更强的准确性保留，并对所有层进行量化，而不仅限于全连接层。
INQ 能在维持或提升准确度的同时实现显著压缩，使其在资源受限设备上的实际部署成为可能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。