QUICK REVIEW

[论文解读] Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case

Paul Almasan, José Suárez‐Varela|arXiv (Cornell University)|Oct 16, 2019

Digital Transformation in Industry被引用 46

一句话总结

本论文提出一种使用图神经网络（GNN）的深度强化学习（DRL）代理，以在未见网络拓扑上泛化路由优化，性能优于在新拓扑上的先进 DRL。

ABSTRACT

Deep Reinforcement Learning (DRL) has shown a dramatic improvement in decision-making and automated control problems. Consequently, DRL represents a promising technique to efficiently solve many relevant optimization problems (e.g., routing) in self-driving networks. However, existing DRL-based solutions applied to networking fail to generalize, which means that they are not able to operate properly when applied to network topologies not observed during training. This lack of generalization capability significantly hinders the deployment of DRL technologies in production networks. This is because state-of-the-art DRL-based networking solutions use standard neural networks (e.g., fully connected, convolutional), which are not suited to learn from information structured as graphs. In this paper, we integrate Graph Neural Networks (GNN) into DRL agents and we design a problem specific action space to enable generalization. GNNs are Deep Learning models inherently designed to generalize over graphs of different sizes and structures. This allows the proposed GNN-based DRL agent to learn and generalize over arbitrary network topologies. We test our DRL+GNN agent in a routing optimization use case in optical networks and evaluate it on 180 and 232 unseen synthetic and real-world network topologies respectively. The results show that the DRL+GNN agent is able to outperform state-of-the-art solutions in topologies never seen during training.

研究动机与目标

让人们认识到在网络优化中需要使用 DRL，并具备对训练时未见拓扑的强泛化能力。
引入一个经过 GNN 增强的 DRL 代理，以学习在图结构网络上具备泛化能力的路由策略。
证明 DRL+GNN 代理在合成和现实世界拓扑上均优于 SoA DRL。
展示在生产网络中的可部署性、低推理开销和可扩展性影响。

提出的方法

将图神经网络与深度强化学习结合，以在图结构的网络拓扑上运行。
将路由动作表示为使用每个源-目标对的 k=4 条最短路径的图嵌入选择。
使用 DQN 风格的目标，其中 Q 值由带有读取网络的 GNN 进行估计。
以链路级特征（容量、介数）和动作引发的带宽分配作为独热编码输入对环境建模。
应用带有 RNN 的信息传递方案（MPNN），在 T 次迭代中演化链路状态，产生 Q 值。
使用经验回放和 epsilon-greedy 探索进行训练；用 SGD 和正则化技术进行优化。

实验结果

研究问题

RQ1一个带有 GNN 的 DRL 代理是否能够将路由决策推广到训练时未见的网络拓扑？
RQ2在合成和现实世界拓扑上，DRL+GNN 方法与先进的 DRL 路由解决方案相比有何差异？
RQ3拓扑规模、链路特征和动作空间设计对学习可泛化路由策略有何影响？
RQ4在生产环境中部署时，DRL+GNN 代理的推理开销和可扩展性如何？

主要发现

DRL+GNN 代理对未见拓扑具有泛化能力，在 Nsfnet 和 Geant2 拓扑上优于 SoA DRL。
在对 180 个未见的合成拓扑和 232 个未见的现实世界拓扑的评估中，DRL+GNN 方法实现的带宽分配高于 SoA DRL。
在一个拓扑（Nsfnet）上训练的代理可以在另一拓扑（Geant2）上超越 SoA DRL，表明鲁棒的泛化能力。
该模型以毫秒级别的决策运行，且随着网络规模线性成本增长，支持在生产类环境中的部署。
使用链路介数作为特征可加速收敛并改善策略学习。
该方法实现了一个通用模型，无需为新拓扑重新训练。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。