QUICK REVIEW

[论文解读] TBD: Benchmarking and Analyzing Deep Neural Network Training

Hongyu Zhu, Mohamed Akrout|arXiv (Cornell University)|Mar 16, 2018

Adversarial Robustness in Machine Learning参考文献 59被引用 55

一句话总结

本文提出一个用于 DNN 训练的跨领域和跨框架的新 TBD 基准套件，以及一个内存分析工具链，并分析 TensorFlow、MXNet、CNTK 在多种硬件配置上的性能。

ABSTRACT

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference -- i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine). TBD currently covers six major application domains and eight different state-of-the-art models. We present a new toolchain for performance analysis for these models that combines the targeted usage of existing performance analysis tools, careful selection of new and existing metrics and methodologies to analyze the results, and utilization of domain specific characteristics of DNN training. We also build a new set of tools for memory profiling in all three major frameworks; much needed tools that can finally shed some light on precisely how much memory is consumed by different data structures (weights, activations, gradients, workspace) in DNN training. By using our tools and methodologies, we make several important observations and recommendations on where the future research and optimization of DNN training should be focused.

研究动机与目标

激发对超越推断和图像分类的广泛 DNN 训练基准的需求。
将 TBD 定义为覆盖多领域（图像分类、翻译、语音、目标检测、对抗网络、强化学习）的代表性套件。
开发覆盖主流框架和硬件配置的端到端 DNN 训练性能分析工具链。
创建用于量化权重、激活、梯度和工作区在 TensorFlow、MXNet 和 CNTK 中的内存使用的内存分析工具。
提供发现结果和建议，以指导未来在 DNN 训练方面的研究和优化。

提出的方法

策划一个覆盖六个领域、在 TensorFlow、MXNet 和 CNTK 上的八个最先进模型的广泛基准套件。
在单 GPU、多 GPU 和多机器设置下评估训练性能。
通过将现有分析工具与领域特定指标集成，构建端到端分析工具链。
为三个主要框架开发内存分析器，将内存使用归因于权重、激活、梯度和工作区。
在框架之间标准化实现，以确保可比的超参数和网络定义。

实验结果

研究问题

RQ1跨不同模型、框架和硬件配置的 DNN 训练的主要瓶颈是什么？
RQ2在训练中，不同数据结构（权重、激活、梯度和工作区）及框架之间的内存使用差异如何？
RQ3对于多样化的训练工作负载，框架（TensorFlow、MXNet、CNTK）的吞吐量和 GPU 利用率如何变化？
RQ4有哪些可操作的建议可用于提升 DNN 训练性能和内存效率？

主要发现

与图像分类模型相比，RNN 训练被 GPU 利用效率降低约 2–3 倍。
GPU 内存常被低效利用；仅用大批量训练来耗尽内存在许多模型上收益有限。
在训练期间，特征图消耗总内存的 70–90%，与推理中权重主导内存的情况形成对比。
新的内存分析工具揭示了跨框架对权重、梯度、特征图和工作区的精确分配。
TBD 基准和工具为在 DNN 训练中优化应用、库和硬件提供方向。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。