[论文解读] Efficient Processing of Deep Neural Networks: A Tutorial and Survey
这篇论文综述了高效深度神经网络处理的技术、硬件平台和设计权衡,强调推理加速、就近数据处理以及算法与硬件协同设计。
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
研究动机与目标
- 提供对深度神经网络及其在人工智能应用中的重要性的概述。
- 综述支持 DNN 推理的硬件平台与架构及其效率提升。
- 突出在不牺牲准确性的前提下降低计算量和能耗的技术。
- 讨论评估 DNN 硬件的资源、基准指标和设计考量。
- 解释联合算法与硬件优化的潜在收益,并识别趋势与机会。
提出的方法
- 介绍 DNN 的背景及其在人工智能和部署应用中的作用。
- 描述 DNN 的组成、模型,以及卷积神经网络(CNN)和全连接层(FC)中的核心计算。
- 综述用于 DNN 的硬件平台、存储技术和就近数据处理方法。
- 讨论混合信号与以存储为中心的策略以降低数据移动成本。
- 概述联合算法–硬件优化方法及其对吞吐量和能效的影响。
- 提出用于 DNN 硬件设计的基准指标和评估考量。
实验结果
研究问题
- RQ1高效 DNN 硬件实现的关键设计考量是什么?
- RQ2如何评估和基准 DNN 硬件的吞吐量、能源效率和精度保持?
- RQ3在 DNN 推理中,不同硬件架构和平台之间的权衡有哪些?
- RQ4算法技术(如剪枝、量化)与硬件设计在实现高效性方面起到什么作用?
- RQ5在就近数据处理和存储技术方面,DNN 的新兴机会有哪些?
主要发现
- DNN 在取得高精度的同时会带来高计算与数据移动成本,促使发展专用加速。
- 卷积、全连接、池化和归一化层构成现代 DNN 的核心构建块,批量归一化(BN)已成为标准做法。
- 各种硬件平台和优化可以在不降低精度的情况下提升吞吐量和能源效率。
- 就近数据处理和混合信号/存储技术被强调为解决数据移动瓶颈的途径。
- 联合算法–硬件优化在实现吞吐量和能源收益的同时可控制精度损失。
- 提出一组基准指标和设计考量,用于评估日益增长的 DNN 加速器格局。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。