QUICK REVIEW

[论文解读] Training and Inference with Integers in Deep Neural Networks

Shuang Wu, Guoqi Li|arXiv (Cornell University)|Feb 13, 2018

Advanced Neural Network Applications参考文献 24被引用 200

一句话总结

WAGE 将训练与推断离散化为低比特宽整型，实现对多数据集具有竞争精度的 DNN 的纯整型数据流。

ABSTRACT

Researches on deep neural networks with discrete parameters and their deployment in embedded systems have been active and promising topics. Although previous works have successfully reduced precision in inference, transferring both training and inference processes to low-bitwidth integers has not been demonstrated simultaneously. In this work, we develop a new method termed as "WAGE" to discretize both training and inference, where weights (W), activations (A), gradients (G) and errors (E) among layers are shifted and linearly constrained to low-bitwidth integers. To perform pure discrete dataflow for fixed-point devices, we further replace batch normalization by a constant scaling layer and simplify other components that are arduous for integer implementation. Improved accuracies can be obtained on multiple datasets, which indicates that WAGE somehow acts as a type of regularization. Empirically, we demonstrate the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands.

研究动机与目标

在嵌入式 AI 系统中激励并实现对低比特宽整型硬件的训练。
开发一个支持前向和反向传播的完整整型数据流 (W,A,G,E)。
提出基于移位的量化和随机舍入，以在保持方向信息的同时控制位宽。
通过引入逐层恒定缩放因子，消除对浮点批量归一化的依赖。

提出的方法

四个量化运算符 Q_W、Q_A、Q_G、Q_E 将权重、激活、梯度和误差信约束为低位宽整型。
采用带饱和的移位线性映射 Q(x,k) 实现均匀量化。
引入逐层移位基缩放因子 alpha，用以抑制权重放大并替代批量归一化。
对梯度更新应用随机舍入以限定位宽，同时保留方向信息。
训练使用小批量 SGD，不使用动量或自适应学习率，以匹配整型数据流约束。
在 MNIST、SVHN、CIFAR-10 和 ImageNet 上进行评估，采用默认的 2-8-8-8 位配置。

实验结果

研究问题

RQ1是否可以使用纯低比特宽整型数据流实现端到端的训练和推断？
RQ2在标准数据集上维持精度所需的 W、A、G、E 位宽是多少？
RQ3用恒定缩放层替代批量归一化对训练和精度有何影响？
RQ4基于整型的训练与量化反向传播会产生哪些正则化效果？

主要发现

WAGE 在精度上可与推断阶段离散化基线相媲美，并提供正则化效益。
使用 2-8-8-8 位配置在推断阶段可实现三进制权重，同时在训练阶段对激活、误差信和梯度保持 8 位表示。
在 MNIST、SVHN、CIFAR-10 上，WAGE 展现出有竞争力的错误率（例如 MNIST 0.40%，SVHN 1.92%，CIFAR-10 6.78%）。
对于 ImageNet 采用 AlexNet，WAGE 模式的 top-1/top-5 错误率约为 51.6/27.8（对于 2888 模式），及其相关变体，证明了对大规模数据集的可扩展性。
位宽分析表明误差位宽 k_E 约 4–8 位对 CIFAR-10 足够，梯度位宽 k_G 约 6–8 位在收敛和精度之间取得平衡。
梯度量化使训练具备通信效率并在合适的超参数下减少内存占用而不牺牲最终性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。