QUICK REVIEW

[论文解读] Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Gaurav Menghani|arXiv (Cornell University)|Jun 16, 2021

Advanced Neural Network Applications被引用 37

一句话总结

一个关于如何通过建模技术、基础设施和硬件使深度学习模型变得更小、更快、更好的全面综述，涵盖裁剪、量化和学习策略。

ABSTRACT

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

研究动机与目标

强调在深度学习中超越准确性和透明度的重要性，关注模型的 footprint 指标（大小、延迟、训练成本）。
提供覆盖建模方法、基础设施和硬件的高效性技术的全面分类。
突出实用指南和带有代码的实验路线图，用于训练和部署帕累托最优模型。
在建模技术与部署考量之间架桥，以实现现实世界的高效人工智能应用。

提出的方法

提出五领域的效率技术认知模型：压缩、学习、自动化、有效架构和基础设施。
详细描述剪枝与显著性剪枝策略，包括结构化与非结构化剪枝以及稀疏性调度。
解释量化与量化感知训练，及权重和激活量化与反量化的算法。
讨论其他压缩技术，如低秩分解和权重共享。
描述学习技术，如蒸馏和基于集成的方法，以在更小的模型上实现相似的性能。
提供对硬件加速实现稀疏和量化模型的经验性视角。

实验结果

研究问题

RQ1哪些技术使深度学习模型在准确性和 footprint（大小/延迟）之间实现帕累托最优的权衡？
RQ2在实际硬件上，压缩和学习方法在效果与实际部署方面的对比如何？
RQ3为了在训练和部署中实现效率提升，需要哪些基础设施和工具？
RQ4结构化与非结构化剪枝及量化方法在现实世界的延迟和大小改进上有何体现？

主要发现

模型架构	稀疏类型	稀疏度 %	FLOPs	Top-1 精度 %	来源
MobileNet v2 - 1.0	Dense (Baseline)	0%	1x	72.0%	Sandler et al. (2018)
MobileNet v2 - 1.0	Unstructured	75%	0.27x	67.7%	Zhu and Gupta (2018)
MobileNet v2 - 1.0	Unstructured	75%	0.52x	71.9%	Evci et al. (2020)
MobileNet v2 - 1.0	Structured (block-wise)	85%	0.11x	69.7%	Elsen et al. (2020); google research (2021)
MobileNet v2 - 1.0	Unstructured	90%	0.12x	61.8%	Zhu and Gupta (2018)
MobileNet v2 - 1.0	Unstructured	90%	0.12x	69.7%	Evci et al. (2020)

剪枝可以显著减少参数量，并且通过结构化剪枝能够实现有意义的延迟和大小降低，适用于快速推理。
量化可以将模型大小压缩约4倍，使用8位权重，量化感知训练通常比后训练量化更能保持准确性。
激活量化和定点执行在具备 SIMD 支持的 CPUs 上可以带来显著加速，在定点图中有时可实现约3x 的推理加速。
量化感知训练在准确性方面优于后训练量化，同时仍提供显著的尺寸减小。
具有兼容内核的结构化稀疏表示在某些硬件上可以在减少参数量的情况下优于密集模型。
彩票票据假说（Lottery Ticket Hypothesis）提示大网络中存在紧凑的子网络，尽管在数据集和架构之间结果各异。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。