QUICK REVIEW

[论文解读] Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning

Xianfu Chen, Honggang Zhang|arXiv (Cornell University)|Mar 25, 2018

IoT and Edge/Fog Computing参考文献 10被引用 21

一句话总结

本文提出了一种基于深度Q网络（DQN）的在线计算卸载策略，用于超密集网络中的移动边缘计算（MEC），通过动态信道质量、能量和任务队列状态实现自适应任务卸载决策。该方法在无需先验统计知识的情况下学习最优策略，与基线方法相比，长期成本最高降低56%。

ABSTRACT

To improve the quality of computation experience for mobile devices, mobile-edge computing (MEC) is emerging as a promising paradigm by providing computing capabilities within radio access networks in close proximity. Nevertheless, the design of computation offloading policies for a MEC system remains challenging. Specifically, whether to execute an arriving computation task at local mobile device or to offload a task for cloud execution should adapt to the environmental dynamics in a smarter manner. In this paper, we consider MEC for a representative mobile user in an ultra dense network, where one of multiple base stations (BSs) can be selected for computation offloading. The problem of solving an optimal computation offloading policy is modelled as a Markov decision process, where our objective is to minimize the long-term cost and an offloading decision is made based on the channel qualities between the mobile user and the BSs, the energy queue state as well as the task queue state. To break the curse of high dimensionality in state space, we propose a deep $Q$-network-based strategic computation offloading algorithm to learn the optimal policy without having a priori knowledge of the dynamic statistics. Numerical experiments provided in this paper show that our proposed algorithm achieves a significant improvement in average cost compared with baseline policies.

研究动机与目标

解决在具有时变环境动态特性的移动边缘计算（MEC）系统中设计自适应计算卸载策略的挑战。
克服因多个基站和超密集网络中动态系统状态导致的状态空间维数灾难问题。
开发一种基于在线学习的卸载策略，无需事先了解信道统计特性或任务到达分布。
在马尔可夫决策过程（MDP）框架下，通过最小化长期成本，优化执行延迟、切换成本和任务丢弃之间的权衡。

提出的方法

将计算卸载问题建模为马尔可夫决策过程（MDP），其中状态由信道质量、能量队列和任务队列状态定义。
采用具有全连接神经网络的深度Q网络（DQN）对Q值函数进行函数逼近，以处理高维状态空间。
使用经验回放和目标网络来稳定DQN算法的训练过程并提高收敛性。
设计一种奖励函数，综合考虑执行延迟、切换成本和任务丢弃惩罚，以引导策略学习。
通过与环境的实时交互在线训练DQN智能体，实现实时动态网络条件下的自适应，而无需先验统计模型。
将DQN配置为包含一层512个神经元的隐藏层，以实现最佳性能，因为更深的网络会降低学习效率。

实验结果

研究问题

RQ1如何设计一种计算卸载策略，以自适应地响应超密集MEC网络中时变的信道条件、能量可用性和任务到达情况？
RQ2像DQN这样的深度强化学习方法在最小化长期系统成本方面，与传统的一次性优化或贪心策略相比，能有多大程度的性能提升？
RQ3DQN架构（深度和宽度）对卸载策略性能（以成本最小化为目标）有何影响？
RQ4采集到的能量到达率如何影响MEC系统中任务执行延迟、切换频率和任务丢弃之间的权衡？

主要发现

所提出的基于DQN的卸载策略相比基线策略，平均长期成本降低了56%，显著提升了性能。
算法在训练过程中表现出稳定的收敛性，表现为损失函数随时间递减，数据采集自90万轮训练之后。
更宽的DQN（每层更多神经元）优于更深的架构，表明在此设置下，通过增加宽度而非深度可更有效地提升函数逼近质量。
能量到达率的提高可减少任务丢弃并降低平均成本，尽管由于更好的信道选择机会，执行延迟和切换频率并不总是下降。
该策略通过学习根据实时条件将任务卸载到最佳可用基站，有效平衡了延迟、切换成本和任务丢弃之间的权衡。
该方法无需事先了解信道统计特性或任务到达分布，因此适用于现实世界中动态的MEC部署。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。