QUICK REVIEW

[论文解读] ALERT: Accurate Anytime Learning for Energy and Timeliness

Chengcheng Wan, Muhammad Husni Santriaji|arXiv (Cornell University)|Oct 31, 2019

Advanced Neural Network Applications被引用 3

一句话总结

ALERT 是一种运行时调度器，通过使用概率模型检测环境波动，联合优化深度神经网络（DNN）推理，协调应用级 DNN 选择与系统级资源配置。与未协调的自适应方法相比，它将能耗降低超过 13%，错误率降低 27%；与拥有完美知识的最优系统相比，仅多消耗 3% 的能耗和 2% 的错误率。

ABSTRACT

An increasing number of software applications incorporate runtime Deep Neural Networks (DNNs) to process sensor data and return inference results to humans. Effective deployment of DNNs in these interactive scenarios requires meeting latency and accuracy constraints while minimizing energy, a problem exacerbated by common system dynamics. Prior approaches handle dynamics through either (1) system-oblivious DNN adaptation, which adjusts DNN latency/accuracy tradeoffs, or (2) application-oblivious system adaptation, which adjusts resources to change latency/energy tradeoffs. In contrast, this paper improves on the state-of-the-art by coordinating application- and system-level adaptation. ALERT, our runtime scheduler, uses a probabilistic model to detect environmental volatility and then simultaneously select both a DNN and a system resource configuration to meet latency, accuracy, and energy constraints. We evaluate ALERT on CPU and GPU platforms for image and speech tasks in dynamic environments. ALERT's holistic approach achieves more than 13% energy reduction, and 27% error reduction over prior approaches that adapt solely at the application or system level. Furthermore, ALERT incurs only 3% more energy consumption and 2% higher DNN-inference error than an oracle scheme with perfect application and system knowledge.

研究动机与目标

解决在动态系统条件下，交互式 DNN 应用面临严格延迟、准确率和能耗约束的挑战。
克服以往方法仅在应用层或仅在系统层进行自适应的局限性，这些方法无法有效协调权衡。
设计一种整体性的运行时调度器，联合选择 DNN 模型与系统资源配置，以满足多约束需求。

提出的方法

ALERT 使用概率模型检测环境波动，实现对工作负载变化的主动适应。
它联合选择 DNN 模型与系统资源配置，以同时满足延迟、准确率和能耗约束。
系统利用运行时反馈来优化其概率模型，从而随时间推移提升自适应决策质量。
ALERT 实时运行，通过根据当前系统和环境状态动态调整推理配置，实现任意时间学习。
它整合了应用级 DNN 自适应与系统级资源管理，避免了孤立优化带来的次优权衡。

实验结果

研究问题

RQ1在动态 DNN 推理工作负载中，联合应用层与系统层自适应是否优于仅在某一层进行自适应？
RQ2概率模型在多大程度上能有效检测环境波动，以指导实时 DNN 与系统配置决策？
RQ3ALERT 与拥有最优 DNN 和资源配置完美知识的最优系统之间，性能差距如何？
RQ4ALERT 在真实世界的 CPU 和 GPU 部署中，如何在显著降低能耗的同时保持低推理错误？

主要发现

与仅在应用层或系统层进行自适应的先前方法相比，ALERT 将能耗降低超过 13%。
与未协调的自适应策略相比，它将 DNN 推理错误率降低 27%。
ALERT 的能耗仅比拥有最优配置完美知识的最优系统高出 3%。
在推理错误方面，ALERT 的准确率比最优系统高出 2%，表明其性能接近最优。
ALERT 在多种工作负载中均表现出一致的改进，涵盖 CPU 和 GPU 平台上的图像与语音任务。
DNN 与系统自适应的整体协调，使得在动态条件下能够实现更优的权衡管理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。