QUICK REVIEW

[论文解读] Comparative Study of Deep Learning Software Frameworks

Soheil Bahrampour, Naveen Ramakrishnan|arXiv (Cornell University)|Nov 19, 2015

Advanced Neural Network Applications参考文献 17被引用 87

一句话总结

本文在 CPU 和 GPU 环境下，针对可扩展性、硬件利用率和速度，对五种深度学习框架——Caffe、Neon、TensorFlow、Theano 和 Torch——进行了评估。主要发现显示，Theano 和 Torch 在可扩展性方面表现最佳；Torch 在 CPU 性能和大型网络的 GPU 部署方面表现优异；Theano 在 LSTM 训练和小型卷积网络方面实现了最佳 GPU 性能。

ABSTRACT

Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative study of five deep learning frameworks, namely Caffe, Neon, TensorFlow, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed. The study is performed on several types of deep learning architectures and we evaluate the performance of the above frameworks when employed on a single machine for both (multi-threaded) CPU and GPU (Nvidia Titan X) settings. The speed performance metrics used here include the gradient computation time, which is important during the training phase of deep networks, and the forward time, which is important from the deployment perspective of trained networks. For convolutional networks, we also report how each of these frameworks support various convolutional algorithms and their corresponding performance. From our experiments, we observe that Theano and Torch are the most easily extensible frameworks. We observe that Torch is best suited for any deep architecture on CPU, followed by Theano. It also achieves the best performance on the GPU for large convolutional and fully connected networks, followed closely by Neon. Theano achieves the best performance on GPU for training and deployment of LSTM networks. Caffe is the easiest for evaluating the performance of standard deep architectures. Finally, TensorFlow is a very flexible framework, similar to Theano, but its performance is currently not competitive compared to the other studied frameworks.

研究动机与目标

评估主流深度学习框架在支持多样化网络架构和训练过程方面的可扩展性。
评估在多线程 CPU 和 GPU（NVIDIA Titan X）配置下的硬件利用率效率。
对多种网络类型进行基准测试，评估训练（梯度计算）和部署（前向传播）的速度性能。
分析框架对高级卷积算法和循环网络中可变长度序列处理的支持情况。
为实践者提供框架选择在实际应用中优势与局限性的对比评估。

提出的方法

基准测试在单台机器上使用标准深度学习架构（全连接、卷积和循环网络）进行。
性能指标包括梯度计算时间（训练）和前向传播时间（部署），分别在 CPU（多线程）和 GPU（NVIDIA Titan X）上测量。
对于卷积网络，评估了各种卷积算法（如基于 FFT 的算法）的支持与性能。
LSTM 训练使用 IMDB 情感分析数据集，采用掩码和填充技术处理可变长度序列。
所有框架在相同条件下测试：固定批量大小（16）、不打乱数据，并保持一致的数据加载方式，以确保公平比较。
研究聚焦于截至 2016 年 2 月仍处于积极开发和社区支持的框架，排除了缺少关键功能（如 LSTM 或 cuDNN 支持）的框架。

实验结果

研究问题

RQ1哪种深度学习框架在支持多样化网络架构和训练过程方面具有最高的可扩展性？
RQ2各框架在不同网络类型下的 CPU 和 GPU 利用率效率如何比较？
RQ3各框架在梯度计算（训练）和前向传播（部署）方面的相对性能如何？
RQ4框架在 RNN 中对可变长度序列处理和优化卷积算法等高级功能的支持程度如何？
RQ5各框架在 GPU 加速设置下的当前性能瓶颈和局限性是什么？

主要发现

Theano 和 Torch 是可扩展性最强的框架，Theano 通过符号微分支持灵活的架构设计，Torch 则凭借强大的社区驱动扩展能力表现出色。
在 CPU 上，Torch 对所有测试架构均实现最佳性能，其次为 Theano，Neon 表现最差。
在 GPU 上，Torch 对大型卷积网络和全连接网络的性能最快，其次为 Theano，Neon 在大型卷积网络上也表现出高度竞争力。
在 GPU 上训练小型卷积网络和全连接网络时，Theano 表现优于其他框架；在 LSTM 网络训练中，Theano 在使用 cuDNN v3 时实现了最佳性能。
Caffe 在评估标准深度架构方面最易用，因其配置简便，无需硬编码。
尽管 TensorFlow 具有高度灵活性并支持异构设备，但在单 GPU 设置下的性能仍逊于 Theano、Torch 和 Neon。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。