QUICK REVIEW

[論文レビュー] Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks

Yu‐Hsin Chen, Joel Emer|arXiv (Cornell University)|Jul 10, 2018

Advanced Neural Network Applications参考文献 16被引用数 65

ひとこと要約

Eyeriss v2 は、柔軟性と高い性能を備えたDNNアクセラレータであり、Row-Stationary Plus (RS+) データフローと階層的メッシュ NoC を導入することで、異なるデータ再利用度と帯域幅要件を示す多様なDNNワークロードを効率的に処理する。256 PEsではEyerissに比べ10.4x–17.9xの高い性能を達成し、16384 PEsでは最大1086.7xの性能向上を示し、多様なDNNにおいて優れたスケーラビリティと適応性を示している。

ABSTRACT

The design of DNNs has increasingly focused on reducing the computational complexity in addition to improving accuracy. While emerging DNNs tend to have fewer weights and operations, they also reduce the amount of data reuse with more widely varying layer shapes and sizes. This leads to a diverse set of DNNs, ranging from large ones with high reuse (e.g., AlexNet) to compact ones with high bandwidth requirements (e.g., MobileNet). However, many existing DNN processors depend on certain DNN properties, e.g., a large number of channels, to achieve high performance and energy efficiency and do not have sufficient flexibility to efficiently process a diverse set of DNNs. In this work, we present Eyexam, a performance analysis framework that quantitatively identifies the sources of performance loss in DNN processors. It highlights two architectural bottlenecks in many existing designs. First, their dataflows are not flexible enough to adapt to the varying layer shapes and sizes of different DNNs. Second, their network-on-chip (NoC) can't adapt to support both high data reuse and high bandwidth scenarios. Based on this analysis, we present Eyeriss v2, a high-performance DNN accelerator that adapts to a wide range of DNNs. Eyeriss v2 has a new dataflow, called Row-Stationary Plus (RS+), that enables the spatial tiling of data from all dimensions to fully utilize the parallelism for high performance. To support RS+, it has a low-cost and scalable NoC design, called hierarchical mesh, that connects the high-bandwidth global buffer to the array of processing elements (PEs) in a two-level hierarchy. This enables high-bandwidth data delivery while still being able to harness any available data reuse. Compared with Eyeriss, Eyeriss v2 has a performance increase of 10.4x-17.9x for 256 PEs, 37.7x-71.5x for 1024 PEs, and 448.8x-1086.7x for 16384 PEs on DNNs with widely varying amounts of data reuse.

研究の動機と目的

近年のDNNにおいて顕著な層の形状の多様性とデータ再利用パターンの変動に起因する、既存のDNNアクセラレータの性能制限を解消すること。
特に不柔軟なデータフローと非適応的NoCに起因する、現在のDNNプロセッサにおけるアーキテクチャ的ボトル neck を同定すること。
高いデータ再利用と高い帯域幅を要するワークロードの両方を効率的にサポートできる新しいアクセラレータを設計すること。
コンパクトな構造から大規模なアーキテクチャに至る広範なDNNモデルにおいて、スケーラブルかつ高性能な推論を実現すること。

提案手法

DNNプロセッサにおける性能ボトル neck を定量的に同定するためのパフォーマンス分析フレームワーク、Eyexam を提案する。
すべての次元にわたるデータのスパatialタイリングを可能にする、Row-Stationary Plus (RS+) データフローを導入する。
グローバルバッファとプロセッシングエレメント (PE) を2段階の階層構造で接続する階層的メッシュネットワークオンチップ (NoC) を設計し、スケーラブルで低コストな帯域幅配分を実現する。
2段階のNoCアーキテクチャを用いることで、高い帯域幅と高い再利用性を要するシナリオの両方を損なわずに対応する。
RS+ データフローを階層的NoCと整合させることで、外部メモリへのアクセスを最小限に抑えるデータ移動最適化を実現する。
柔軟なタイリングとスケーラブルなインタコネクトを組み合わせることで、多様なDNNワークロードへの動的適応を可能にする。

実験結果

リサーチクエスチョン

RQ1多様なDNNワークロードを処理する際、既存のDNNアクセラレータの性能を制限するアーキテクチャ的ボトル neck は何か？
RQ2DNNアクセラレータは、どのようにして高いデータ再利用と高い帯域幅の両方のワークロードを効率的にサポートできるか？
RQ3柔軟なデータフローとスケーラブルなNoC設計により、広範なDNNモデルにおいて高い性能を実現できるか？
RQ4再構成可能なデータフローと階層的NoCは、性能とエネルギー効率をどの程度向上できるか？
RQ5Eyeriss v2 のパフォーマンスは、異なるDNNにおいてPE数の増加に伴いどのようにスケーリングするか？

主な発見

Eyeriss v2 は、256 PEsにおける多様なDNNにおいて、Eyerissに比べ10.4x–17.9xのパフォーマンス向上を達成した。
1024 PEsでは、Eyerissに比べ37.7x–71.5xの高いパフォーマンスを発揮した。
16384 PEsでは、Eyerissに比べ448.8x–1086.7xのパフォーマンス向上を達成し、優れたスケーリングを示した。
Row-Stationary Plus (RS+) データフローにより、すべてのデータ次元にわたる並列処理の完全な活用が可能になり、リソース利用率が向上した。
階層的メッシュNoCは、高い帯域幅と高い再利用性の両方のワークロードを効果的にサポートし、スケーラビリティを損なわずに実現した。
Eyexam分析により、不柔軟なデータフローと非適応的NoCが、既存のDNNアクセラレータにおける主なパフォーマンスボトル neck であることが明らかになった。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。