QUICK REVIEW

[论文解读] WRPN: Wide Reduced-Precision Networks

Asit Mishra, Eriko Nurvitadhi|arXiv (Cornell University)|Sep 4, 2017

Advanced Neural Network Applications参考文献 20被引用 181

一句话总结

WRPN 以更宽的层和降低精度的激活值与权重训练并运行深度网络，在显著降低内存、带宽和能源消耗的同时，达到甚至超过全精度基线的精度，并具备对硬件友好的量化，在 AlexNet、ResNet-34 和 Inception-BN 变体上取得结果。

ABSTRACT

For computer vision applications, prior works have shown the efficacy of reducing numeric precision of model parameters (network weights) in deep neural networks. Activation maps, however, occupy a large memory footprint during both the training and inference step when using mini-batches of inputs. One way to reduce this large memory footprint is to reduce the precision of activations. However, past works have shown that reducing the precision of activations hurts model accuracy. We study schemes to train networks from scratch using reduced-precision activations without hurting accuracy. We reduce the precision of activation maps (along with model parameters) and increase the number of filter maps in a layer, and find that this scheme matches or surpasses the accuracy of the baseline full-precision network. As a result, one can significantly improve the execution efficiency (e.g. reduce dynamic memory footprint, memory bandwidth and computational energy) and speed up the training and inference process with appropriate hardware support. We call our scheme WRPN - wide reduced-precision networks. We report results and show that WRPN scheme is better than previously reported accuracies on ILSVRC-12 dataset while being computationally less expensive compared to previously reported reduced-precision networks.

研究动机与目标

动机：在视觉任务中训练/推理的内存与计算效率挑战，聚焦激活图作为主要内存占用。
提出 WRPN：在减小激活与权重的同时，通过增宽层宽来保持精度。
证明在 ImageNet 上，宽且低精度的网络在多种体系结构中可以达到甚至超过基线精度。
评估在 GPU、FPGA 与 ASIC 上的硬件影响及潜在效率提升。

提出的方法

使用简单的裁剪与舍入方案，将每层的激活量化为 4 位或 2 位，权重量化为 2 位或 1 位或 4 位。
通过增大每层的滤波器映射数量（宽度）来抵消低精度带来的信息损失，从而保持或提高精度。
从头开始使用 WRPN 量化与宽度端到端训练网络，并与 AlexNet、ResNet-34 与 Inception-BN 变体的全精度基线进行比较。
在量化节点的反向传播中使用直通估计器（STE），对 k 位表示（k>1）使用简单的最小-最大裁剪与舍入，对二进制情形采用类似 Binary/Weight Normalization 的处理。
将计算成本评估为 FMA 操作数乘以激活宽度与权重量化宽度之和，并与基线 FP32 计算成本进行比较。

实验结果

研究问题

RQ1在网络宽度增大时，激活图精度能否在不损害精度的情况下低于全精度？
RQ2在使用低精度操作数的同时增宽网络，是否能弥补信息损失并保持或提高精度？
RQ3在 ImageNet 的标准视觉体系结构中，WRPN 的精度与硬件效率权衡是什么？
RQ4与 AlexNet 相比，WRPN 配置在更深的网络（如 ResNet-34 和 Inception-BN）上的表现如何？
RQ5WRPN 的低精度设定对实际硬件性能（GPU/FPGA/ASIC）的影响是什么？

主要发现

激活为 4 位、权重为 2 位在网络增宽时可以达到基线精度（AlexNet）。
将滤波器数量加倍，激活为 4 位、权重为 2 位时，AlexNet 的精度达到与全精度基线相当的水平（表格显示 4b A/2b W 2x 宽度时达到同等精度）。
对于 ResNet-34，2 位权重和 4 位激活在 2x 宽度扩展下仍能达到同等精度；4 位 A 与 2 位 W 也能达到同等精度，而 2 位 A 与 W（二进制/三进制）变体在更大宽度下也可接近基线。
在带批归一化的 Inception 中，4 位激活和 4 位权重并将滤波器组数量翻倍可接近基线精度（71.63 对 71.64）。
使用 WRPN 宽化通常增加原始计算量，但较低的操作数位宽带来显著的效率提升； FPGA 与 ASIC 相比 FP32 提供有利的效率提升（6.5x 至 100x），而 GPU 的提升则较为有限。
二进制/三进制配置配合更宽的网络，能够在 ResNet-34 和 AlexNet 上达到接近基线或最先进的精度，同时显著降低计算成本。
在各网络中，4 位激活和 2 位权重成为实用且强有力的工作点，平衡了精度与硬件简单性（例如支持不含乘法器的实现，使用三进制权重）。
WRPN 展示出明显的硬件友好性：FPGA 与 ASIC 显示出较大效率提升；由于原生低精度支持有限，GPU 的提升相对有限。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。