QUICK REVIEW

[論文レビュー] TBD: Benchmarking and Analyzing Deep Neural Network Training

Hongyu Zhu, Mohamed Akrout|arXiv (Cornell University)|Mar 16, 2018

Adversarial Robustness in Machine Learning参考文献 59被引用数 55

ひとこと要約

本研究は、さまざまな領域とフレームワークに跨る新しい TBD ベンチマーク Suite を DNN 訓練のために提案し、メモリプロファイリングツールチェーンを併用し、TensorFlow、MXNet、CNTK の様々なハードウェア構成での性能を分析する。

ABSTRACT

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference -- i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine). TBD currently covers six major application domains and eight different state-of-the-art models. We present a new toolchain for performance analysis for these models that combines the targeted usage of existing performance analysis tools, careful selection of new and existing metrics and methodologies to analyze the results, and utilization of domain specific characteristics of DNN training. We also build a new set of tools for memory profiling in all three major frameworks; much needed tools that can finally shed some light on precisely how much memory is consumed by different data structures (weights, activations, gradients, workspace) in DNN training. By using our tools and methodologies, we make several important observations and recommendations on where the future research and optimization of DNN training should be focused.

研究の動機と目的

推論や画像分類を超える広範な DNN 訓練ベンチマークの必要性を喚起する。
TBD を、image classification、translation、speech、object detection、adversarial nets、reinforcement learning など複数の領域を包含する代表的なスイートとして定義する。
主要なフレームワークとハードウェア構成に跨る DNN 訓練のエンドツーエンドの性能分析ツールチェーンを開発する。
TensorFlow、MXNet、および CNTK における weights、activations、gradients、workspace のメモリ使用量を定量化するメモリプロファイリングツールを作成する。
将来の研究と DNN 訓練の最適化を導く発見と推奨を提供する。

提案手法

TensorFlow、MXNet、 CNTK に跨る six domains と eight state-of-the-art models を含む広範なベンチマークスイートを厳選する。
single-GPU、multi-GPU、および multi-machine セットアップでの訓練性能を評価する。
既存のプロファイラを、ドメイン特化の指標と統合してエンドツーエンドの分析ツールチェーンを構築する。
weights、activations、gradients、workspace へメモリ使用量を帰属させるため、3 つの主要フレームワークのメモリプロファイラを開発する。
フレームワーク間で実装を正規化し、ハイパーパラメータとネットワーク定義を比較可能にする。

実験結果

リサーチクエスチョン

RQ1異なるモデル、フレームワーク、ハードウェア構成における DNN 訓練の主なボトルネックは何か？
RQ2トレーニングにおけるデータ構造（weights、activations、gradients、workspace）とフレームワーク間でメモリ使用量はどのように異なるか？
RQ3多様な訓練ワークロードに対して、TensorFlow、MXNet、CNTK の各フレームワークでスループットと GPU 利用率はどのように異なるか？
RQ4DNN 訓練の性能とメモリ効率を改善するために、どのような実用的な推奨が得られるか？

主な発見

RNN 訓練は image-classification モデルと比較して GPU による利用効率が 2–3x 程度低い。
GPU メモリはしばしば過小利用される。大容量のミニバッチでメモリを使い切るだけでは、多くのモデルで限られた効果しか得られない。
訓練中には feature maps が総メモリの 70–90% を占め、推論時には weights がメモリを支配するのとは対照的である。
新しいメモリプロファイリングツールは、各フレームワーク全体で weights、gradients、feature maps、workspace の正確な割り当てを明らかにする。
TBD ベンチマークとツールは、DNN 訓練におけるアプリケーション、ライブラリ、ハードウェアの最適化の方向性を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。