QUICK REVIEW

[论文解读] Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

En Li, Zhi Zhou|arXiv (Cornell University)|Jun 20, 2018

Age of Information Optimization参考文献 9被引用 52

一句话总结

Edgent 提出一种协同推理框架，联合优化移动设备与边缘服务器之间的 DNN 划分以及具备早期退出的 DNN 尺寸调整，以在满足时延截止的同时最大化准确率。

ABSTRACT

As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy overhead. While offloading DNNs to the cloud for execution suffers unpredictable performance, due to the uncontrolled long wide-area network latency. To address these challenges, in this paper, we propose Edgent, a collaborative and on-demand DNN co-inference framework with device-edge synergy. Edgent pursues two design knobs: (1) DNN partitioning that adaptively partitions DNN computation between device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. (2) DNN right-sizing that accelerates DNN inference through early-exit at a proper intermediate DNN layer to further reduce the computation latency. The prototype implementation and extensive evaluations based on Raspberry Pi demonstrate Edgent's effectiveness in enabling on-demand low-latency edge intelligence.

研究动机与目标

在设备和网络约束下，推动移动应用的低时延 DNN 推理。
提出一种利用设备-边缘协同的 DNN 执行的协同推理框架。
引入自适应 DNN 分区和早期退出的尺寸调整，以满足预定义的时延截止。
提供一个离线-在线工作流，用于预测每层的时延并优化分区/退出点。
通过基于 Raspberry Pi 的原型和经验评估来证明可行性。

提出的方法

分区：在设备与边缘之间自适应地分割 DNN 计算，以在带宽约束下最小化时延。
尺寸调整：在 DNN 中启用早期退出，以减少计算并实现时延-准确度权衡。
离线分析：建立基于回归的模型，预测设备和边缘上每层的时延，并训练具有多个退出的分支网络。
在线优化：在带宽和时延输入的条件下，联合优化退出点和分区点，以在时延截止下最大化准确度。
协同推理：根据所选择的计划，在边缘执行预分区的层，设备上执行其余层。

实验结果

研究问题

RQ1在带宽变化时，设备与边缘之间的 DNN 分区如何降低端到端时延？
RQ2通过早期退出的 DNN 尺寸调整在满足时延的前提下提升时延性能并保持准确度，是否可行？
RQ3在给定时延约束下，达到最大准确率的分区点和退出点的最优组合是什么？
RQ4基于回归的时延预测在不同层类型下用于指导在线优化的效果如何？
RQ5在面向实时边缘智能的场景中，提出的离线-在线 Edgent 工作流在商用硬件上是否可行？

主要发现

通过联合优化分区点和退出点，相比仅在设备上执行或仅在边缘上执行，时延可以降低。
基于回归的每层时延模型使在线优化速度很快（实验中 ≤1 ms）。
更高的带宽允许选择更高准确度的退出模型，在时延约束内提升端到端准确性。
随着带宽的提升或时延要求放松，最佳退出点通常会更靠后/更高层。
原型实验表明 Edgent 能在不同带宽条件下满足严格的时延目标并超过基线方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。