Skip to main content
QUICK REVIEW

[论文解读] Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Tailin Liang, John Glossner|arXiv (Cornell University)|Jan 24, 2021
Advanced Neural Network Applications参考文献 156被引用 50
一句话总结

本综述回顾用于加速深度神经网络的剪枝和量化技术,比较静态/动态剪枝,并详细介绍在不同框架中的准确性结果。

ABSTRACT

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces computations by reducing the precision of the datatype. Weights, biases, and activations may be quantized typically to 8-bit integers although lower bit width implementations are also discussed including binary neural networks. Both pruning and quantization can be used independently or combined. We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide practical guidance for compressing networks.

研究动机与目标

  • 在不显著降低准确性的前提下,推动网络压缩以实现实时部署并降低能耗。
  • 对剪枝和量化技术及其在网络粒度和部署场景中的权衡进行分类和分析。
  • 为在卷积神经网络上应用剪枝和量化提供实用指南。
  • 比较跨框架的压缩方法的性能并突出它们的优点和缺点。

提出的方法

  • 将剪枝分为静态(离线)和动态(运行时)两种,并讨论标准、影响及权衡。
  • 讨论基于幅值的和基于惩罚的剪枝方法,包括 l1/L2 正则化及在适用时的 Hessian 基方法。
  • 描述用于剪枝的形状级、滤波器级、通道级及其他粒度选项及其对稀疏性和准确性的影响。
  • 解释从 8 位到更低位宽的量化方案,并考虑二值网络,同时讨论框架间的比较。
  • 总结如何独立或联合使用剪枝和量化并提供实用的压缩指导。

实验结果

研究问题

  • RQ1对于 CNN 加速,主要的剪枝和量化技术是什么,它们在离线与运行时部署中有何不同?
  • RQ2粒度选择(按元素、按通道、按滤波器、按层)如何影响稀疏性、准确性和硬件性能?
  • RQ3在常见的 CNN 基准和框架中,应用剪枝和/或量化通常对准确性有何影响?
  • RQ4在实际部署中可以提供哪些用于选择和应用剪枝与量化的实际指南?

主要发现

  • 剪枝和量化是互补的技术,可独立或共同使用以加速 CNN 推理。
  • 静态与动态剪枝在离线与运行时优化中提供不同的权衡,对稀疏性和准确性有不同影响。
  • 不同的剪枝粒度(元素级、通道级、滤波器级、层级)产生不同的稀疏模式和硬件影响。
  • 量化通常将精度降低到 8 位整数,但也可扩展到更低位宽甚至二值网络,对准确性的影响各不相同。
  • 本综述比较当前的最先进方法和框架,提供实用压缩策略的指南。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。