QUICK REVIEW

[论文解读] Bayesian Bits: Unifying Quantization and Pruning

Mart van Baalen, Christos Louizos|arXiv (Cornell University)|May 14, 2020

Advanced Neural Network Applications参考文献 39被引用 77

一句话总结

贝叶斯 Bits 通过基于梯度的优化，结合一种新型带有可学习门控的残差量化分解，联合学习混合精度量化与剪枝，实现对硬件友好比特宽的同时获得比静态比特宽基线更好的精度-效率权衡。

ABSTRACT

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide whether or not to add this quantized residual error for a higher effective bit width and lower quantization noise. By starting with a power-of-two bit width, this decomposition will always produce hardware-friendly configurations, and through an additional 0-bit option, serves as a unified view of pruning and quantization. Bayesian Bits then introduces learnable stochastic gates, which collectively control the bit width of the given tensor. As a result, we can obtain low bit solutions by performing approximate inference over the gates, with prior distributions that encourage most of them to be switched off. We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks that provide a better trade-off between accuracy and efficiency than their static bit width equivalents.

研究动机与目标

通过联合剪枝与混合精度量化来降低推理成本的动机。
引入暴露硬件友好的量化分解，呈现 2 的幂次方比特宽。
开发贝叶斯门控与变分目标，联合学习比特宽与剪枝。
提供一个实用的优化方案，结合 STE 启发的梯度估计器与门控阈值化。
在基准测试上展示改进的精度/效率权衡，并展示端到端和后训练变体。

提出的方法

将量化分解为连续的残差量化，位宽翻倍（2、4、8、16、32）。
对残差进行量化并相加，形成更高精度的量化值。
在每个残差上引入二进制门 z，控制是否加入更高位残差，从而实现零位剪枝。
将门控学习框定为变分推断，带自回归先验和后验，偏向低比特宽。
推导一个实用目标，类似 L0 正则化，惩罚包含更高位残差；对梯度优化使用硬混合 relaxations。
描述通过梯度检查点来管理内存、使用基于 PACT 的输入裁剪以及通过 rounding 的反向传播 STE。

实验结果

研究问题

RQ1残差为基础的硬件友好分解是否能揭示混合精度量化所需的全部 2 的幂次比特宽？
RQ2对残差可学习的门控是否能通过联合剪枝与量化，在精度与计算之间取得有效权衡？
RQ3带先验正则的贝叶斯目标是否在多任务上比静态比特宽基线提供更好的精度–效率权衡？
RQ4在标准基准上，该方法在端到端和后训练设置下是否可行？

主要发现

方法	# 位宽 W/A	准确度 (%)	相对 GBOPs (%)
FP32	32/32	99.36	100
TWN	2/32	99.35	5.74
LR-Net	1/32	99.47	2.99
RQ	8/8	-	6.25
RQ	4/4	-	1.56
RQ	2/8	99.37	0.52
WAGE	2/8	99.60	1.56
DQ*	Mixed	-	0.48
DQ - restricted*	Mixed	-	0.54
Bayesian Bits μ=0.01	Mixed	99.30 ±0.03	0.36 ±0.01	93.23 ±0.10	0.51 ±0.03

Bayesian Bits 在 MNIST 和 CIFAR-10 上的精度与计算效率（BOPs）权衡优于多种基线。
在 ImageNet 的 ResNet18 和 MobileNetV2 上，Bayesian Bits 相对于固定比特基线和其他量化方法提供更有利的精度–BOP 权衡。
全局正则化参数 mu 的变化可控稀疏性/比特宽，在某些情形下实现高度压缩但保持精度。
该方法支持端到端微调和后训练的混合精度量化，性能具有竞争力。
门控表现出可解释的行为，常将剪枝到低比特宽，同时保留关键层（如第一层/最后一层需要更高精度）。
该方法在一个实用的优化路径下，将剪枝与量化统一在一个概率框架中。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。