QUICK REVIEW

[Paper Review] Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze, Yu‐Hsin Chen|arXiv (Cornell University)|Mar 27, 2017

Advanced Neural Network Applications99 references50 citations

TL;DR

This paper surveys the techniques, hardware platforms, and design trade-offs for efficient deep neural network processing, emphasizing inference acceleration, near-data processing, and algorithm–hardware co-design.

ABSTRACT

Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

Motivation & Objective

Provide an overview of deep neural networks and their importance across AI applications.
Survey hardware platforms and architectures that support DNN inference and their efficiency gains.
Highlight techniques to reduce computation and energy without sacrificing accuracy.
Discuss resources, benchmarking metrics, and design considerations for evaluating DNN hardware.
Explain potential gains from joint algorithm and hardware optimizations and identify trends and opportunities.

Proposed method

Present background on DNNs and their role in AI and deployed applications.
Describe DNN components, models, and the core computations in CNNs and FC layers.
Survey hardware platforms, memory technologies, and near-data processing approaches for DNNs.
Discuss mixed-signal and memory-centric strategies to mitigate data movement costs.
Outline joint algorithm–hardware optimization approaches and their impact on throughput and energy efficiency.
Propose benchmarking metrics and evaluation considerations for DNN hardware designs.

Experimental results

Research questions

RQ1What are the key design considerations for efficient DNN hardware implementations?
RQ2How can DNN hardware be evaluated and benchmarked for throughput, energy efficiency, and accuracy preservation?
RQ3What are the trade-offs between different hardware architectures and platforms for DNN inference?
RQ4What roles do algorithmic techniques (e.g., pruning, quantization) and hardware design play in achieving efficiency?
RQ5What emerging opportunities exist in near-data processing and memory technologies for DNNs?

Key findings

DNNs achieve high accuracy but incur high computational and data movement costs, motivating specialized acceleration.
Convolutional, fully-connected, pooling, and normalization layers form the core building blocks of modern DNNs, with BN becoming standard practice.
Various hardware platforms and optimizations can improve throughput and energy efficiency without degrading accuracy.
Near-data processing and mixed-signal/memory technologies are highlighted as avenues to address data movement bottlenecks.
Joint algorithm–hardware optimization can yield throughput and energy benefits while managing accuracy loss.
A set of benchmarking metrics and design considerations are proposed to evaluate the growing landscape of DNN accelerators.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.