QUICK REVIEW

[论文解读] Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

Philipp Gysel|arXiv (Cornell University)|May 20, 2016

Advanced Neural Network Applications参考文献 38被引用 101

一句话总结

Ristretto 是一个快速的、GPU 加速的框架，模拟硬件算术以量化 CNN 压缩，通过降低权值和激活的比特宽度来实现仅加法器或低比特宽度实现，并通过微调来维持精度。

ABSTRACT

Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

研究动机与目标

通过在不增加解压缩复杂度的前提下压缩 CNN，推动在嵌入式设备上实现能效更高的神经网络推理。
介绍 Ristretto 框架，以模拟硬件算术并探索权重与激活的比特宽度缩减。
证明像 CaffeNet 和 SqueezeNet 这样的 CNN 可以在 1% 容差内缩减至 8 位，且精度损失最小。

提出的方法

通过将层输入、权重和输出量化为较低精度，来模拟自定义加速器的硬件算术。
通过调整比特宽度并使用加法器树进行累加，支持定点和仅加法器算术两种情形。
应用舍入方案（推理时为 round-nearest-even；微调时为随机舍入）以控制量化误差。
在离散参数空间中对量化网络进行微调，使用影子全精度权重并在更新时采用随机舍入。
利用 GPU 优化的例程快速压缩网络，同时不改变网络结构也不引入解压缩。

实验结果

研究问题

RQ1在给定容差下，降低 CNN 参数和激活的数值精度如何影响分类精度？
RQ2如 CaffeNet 和 SqueezeNet 这样的 CNN 能否压缩到 8 位表示而不超过 1% 的准确率损失？
RQ3在离散参数空间中，哪种舍入策略在推理阶段和微调阶段最能保持精度？
RQ4面向硬件的近似在 CNN 加速器中的内存占用和乘法器使用方面有哪些实际影响？

主要发现

Ristretto 能在 1% 容差内将 CaffeNet 和 SqueezeNet 压缩为 8 位表示。
在离散参数空间中的量化与微调有助于在大幅降低比特宽度后恢复精度。
round-nearest-even 用于确定性推理量化，而随机舍入有助于离散空间的微调。
该框架通过模拟硬件算术来减少内存占用和乘法器面积，同时不引入解压缩开销。
通过在硬件路径中调整比特宽度和累加精度，可以实现仅加法器算术。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。