QUICK REVIEW

[论文解读] NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

Ermao Cai, Da-Cheng Juan|arXiv (Cornell University)|Oct 15, 2017

Advanced Neural Network Applications参考文献 13被引用 77

一句话总结

NeuralPower 提供一个分层多项式回归框架来预测 CNN 在 GPU 上的功耗、运行时和能量，从而在训练前实现能耗感知的架构选择。它还引入 Energy-Precision Ratio，用以在准确性和能效之间取得平衡。

ABSTRACT

"How much energy is consumed for an inference made by a convolutional neural network (CNN)?" With the increased popularity of CNNs deployed on the wide-spectrum of platforms (from mobile devices to workstations), the answer to this question has drawn significant attention. From lengthening battery life of mobile devices to reducing the energy bill of a datacenter, it is important to understand the energy efficiency of CNNs during serving for making an inference, before actually training the model. In this work, we propose NeuralPower: a layer-wise predictive framework based on sparse polynomial regression, for predicting the serving energy consumption of a CNN deployed on any GPU platform. Given the architecture of a CNN, NeuralPower provides an accurate prediction and breakdown for power and runtime across all layers in the whole network, helping machine learners quickly identify the power, runtime, or energy bottlenecks. We also propose the "energy-precision ratio" (EPR) metric to guide machine learners in selecting an energy-efficient CNN architecture that better trades off the energy consumption and prediction accuracy. The experimental results show that the prediction accuracy of the proposed NeuralPower outperforms the best published model to date, yielding an improvement in accuracy of up to 68.5%. We also assess the accuracy of predictions at the network level, by predicting the runtime, power, and energy of state-of-the-art CNN architectures, achieving an average accuracy of 88.24% in runtime, 88.34% in power, and 97.21% in energy. We comprehensively corroborate the effectiveness of NeuralPower as a powerful framework for machine learners by testing it on different GPU platforms and Deep Learning software tools.

研究动机与目标

激发在多种 GPU 平台上部署前预测 CNN 推理能耗的需求。
开发一个分层预测框架，在服务过程中在不运行网络的情况下估算 CNN 的功耗、运行时和能量。
实现快速识别运行时、功耗或能量瓶颈，以引导能源效率的架构搜索。
提出在多种 CNN 架构和 GPU 平台上的度量与验证，以证明准确性和泛化能力。

提出的方法

提出一个分层的 NeuralPower 框架，使用稀疏多项式回归对卷积层、全连接层和池化层的功耗与运行时进行建模。
使用两部分的层模型：(1) 基于层配置特征的常规多项式项；(2) 捕捉内存访问和 FLOPs 等操作的特殊项；应用 Lasso 和交叉验证来选择模型项。
通过对每层运行时求和并用每层功耗与运行时估计来计算能量，将层级模型扩展到网络级预测。
通过在固定 GPU 状态、TensorFlow 和 nvidia-smi 测量条件下，对一组 CNN 架构进行 Profiling，使用 Nvidia Titan X 收集数据集以训练模型。
在多种 CNN（如 VGG、NIN、CIFAR nets）上将网络级预测与实际测量进行对比，以量化运行时、功耗和能量的准确性。
引入 Energy-Precision Ratio 作为在选择架构时权衡分类准确性与能耗的度量。

实验结果

研究问题

RQ1分层级多项式回归是否能够在不同架构和框架下，准确预测 CNN 的各层功耗和运行时？
RQ2NeuralPower 对多样化 CNN 在 GPU 平台上的网络级运行时、功耗和能量的预测有多准确？
RQ3Energy-Precision Ratio 是否能够在不显著牺牲准确性的情况下有效引导选择能耗更低的 CNN 架构？

主要发现

NeuralPower 在所测试的 CNN 上，在运行时的网络级平均准确度约为 88.24%，在功耗方面为 88.34%，在能量预测方面为 97.21%。
使用稀疏多项式回归的层级模型在运行时预测方面超越了此前的 Paleo 模型，在 RMSE/RMSPE 指标上提升高达 68.5%。
层级的功耗预测在卷积、池化和全连接层上显示 RMSPE 小于 9%。
与逐层实际值相比，网络级能量预测的平均 RMSPE 约为 2.79%。
该框架提供了逐层的详细分解，以识别网络内的运行时、功耗和能量瓶颈。
Energy-Precision Ratio 提供了一个可调度的度量，用于在准确性和能量之间取得平衡，指导满足不同应用需求的能效 CNN 选择。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。