QUICK REVIEW

[论文解读] Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler, Dan Alistarh|arXiv (Cornell University)|Jan 31, 2021

Machine Learning and ELM被引用 341

一句话总结

本综述全面回顾深度网络中剪枝与增长的稀疏化技术，详细介绍方法、理论和硬件考虑因素，以实现高效的推理和训练。

ABSTRACT

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.

研究动机与目标

调查并对神经网络的稀疏化方法进行分类。
解释稀疏性的数学基础和实际训练策略。
为从业者提供在当下应用稀疏化的指导。
讨论硬件影响及未解决的问题，以指导未来研究。

提出的方法

按被剪枝的对象、剪枝发生的时机以及实现稀疏性的方法对稀疏化进行分类。
描述带稀疏性的训练的确定性与概率性表述，包括 SGD、Fisher/Hessian 视角，以及贝叶斯变分方法。
解释训练过程中的 pruning-as-growth，包括重新添加连接以维持模型容量的机制。
概述在完整卷积和 Transformer 架构中引入稀疏性的实际技术。
讨论加速稀疏模型的软件与硬件考量并提出评估基准。

实验结果

研究问题

RQ1深度神经网络中主要的剪枝与增长稀疏化技术有哪些？
RQ2稀疏性如何与训练动力学和泛化能力相互作用？
RQ3在实际硬件上进行推理和训练时，有效利用稀疏性的方法有哪些？
RQ4应使用哪些指标和基准来比较稀疏网络？
RQ5在推进稀疏深度学习方面，仍存有哪些待解决的问题？

主要发现

稀疏方法在模型大小上可实现 10-100x 的减少，同时几乎不损失准确性。
稀疏化为内存、计算和能源节省提供潜力，适用于移动端和大规模模型。
剪枝技术与训练动力学相关，可能带来正则化效应和鲁棒性提升。
变分与贝叶斯视角为诱导和测量稀疏性提供了有原则的方法。
方法快速进展并且需要共同的基准以实现公平比较。
稀疏训练和增长策略在降低资源占用的同时可维持甚至提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。