QUICK REVIEW

[论文解读] Experience-driven Networking: A Deep Reinforcement Learning based Approach

Zhiyuan Xu, Jian Tang|arXiv (Cornell University)|Jan 17, 2018

Software-Defined Networks and 5G参考文献 17被引用 50

一句话总结

DRL-TE 是一个无模型、基于深度强化学习的流量工程框架，它使用面向TE的探索和优先级经验回放，在动态网络中优化端到端效用、时延和吞吐量，在 ns-3 仿真中优于基线方法和 DDPG。

ABSTRACT

Modern communication networks have become very complicated and highly dynamic, which makes them hard to model, predict and control. In this paper, we develop a novel experience-driven approach that can learn to well control a communication network from its own experience rather than an accurate mathematical model, just as a human learns a new skill (such as driving, swimming, etc). Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in communication networks; and present a novel and highly effective DRL-based control framework, DRL-TE, for a fundamental networking problem: Traffic Engineering (TE). The proposed framework maximizes a widely-used utility function by jointly learning network environment and its dynamics, and making decisions under the guidance of powerful Deep Neural Networks (DNNs). We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. To validate and evaluate the proposed framework, we implemented it in ns-3, and tested it comprehensively with both representative and randomly generated network topologies. Extensive packet-level simulation results show that 1) compared to several widely-used baseline methods, DRL-TE significantly reduces end-to-end delay and consistently improves the network utility, while offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL method (for continuous control), Deep Deterministic Policy Gradient (DDPG), which, however, does not offer satisfying performance.

研究动机与目标

在动态网络中推动无模型、以体验为驱动的流量工程方法。
开发一种基于 DRL 的控制框架，在没有显式模型的情况下学习网络动态和控制策略。
提出面向 TE 的探索与基于Actor-Critic的优先级经验回放，以优化用于 TE 的 DRL。
通过在多种拓扑上进行分组级 ns-3 仿真来验证该方法，并与基线进行比较。

提出的方法

将 TE 表述为一个连续控制的 DRL 问题，对于每个会话，状态为 s = {xk, zk}。
将动作定义为跨所有会话的候选路径的分流比。
使用基于DDPG的 actor-critic DRL，带有由基线 TE 解决方案引导的 TE 感知探索。
为 actor-critic 训练引入带有双优先度度量的优先级经验回放（TD误差和Q梯度）。
在 ns-3 中实现 DRL-TE，采用两层神经网络的 actor/critic 及用于稳定性的目标网络。
在 NSFNET、ARPANET 和 BRITE 生成的拓扑上，与最短路径、负载均衡、NUM-TE 和 DDPG 进行对比评估。

实验结果

研究问题

RQ1在没有准确网络模型的情况下，模型无关的 DRL 方法能否学到有效的 TE 策略？
RQ2TE 感知探索和优先级经验回放是否能提升 DRL 在连续 TE 问题上的性能？
RQ3与传统方法和 DDPG 相比，DRL-TE 在端到端时延、吞吐量和整体网络效用方面的表现如何？
RQ4DRL-TE 框架对不断变化的网络条件和拓扑是否具有鲁棒性？

主要发现

与 SP、LB、NUM和 DDPG 相比，DRL-TE 在 NSFNET、ARPANET 和 BRITE 拓扑中显著降低端到端时延（例如，在 NSF 上时延减少最高可达 74.6%）。
在 NSF 拓扑上，DRL-TE 相对于 SP、LB、NUM 和 DDPG 的平均时延下降分别为 55.4%、47.1%、70.5% 和 44.2%。
DRL-TE 在 TE 情景的连续控制方面持续优于最先进的 DRL 方法（DDPG）。
DRL-TE 展示了对网络变化的鲁棒性，在改善网络效用的同时提供更好或相当的吞吐量。
该框架包括新颖的 TE 感知探索（基线 TE 解决方案引导探索）以及基于 Actor-Critic 的优先级经验回放（基于 TD 误差和 Q 梯度的两部分优先级）。
NSFNET/ARPANET 的结果在不同流量需求下表现出端到端效用和时延的显著提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。