QUICK REVIEW

[论文解读] Deep Reinforcement Learning with Graph-based State Representations.

Vikram Waradpande, Daniel Kudenko⋆|arXiv (Cornell University)|Apr 29, 2020

Reinforcement Learning in Robotics参考文献 4被引用 8

一句话总结

本文提出在深度强化学习中使用基于图的状态表征，以提升网格世界导航任务中的样本效率和性能。通过在MDP的底层图结构上应用节点表征学习方法——尤其是基于随机游走的方法——作者表明，这些嵌入表征始终优于标准的矩阵表征，且更简单的方法往往超越复杂的图卷积网络。

ABSTRACT

Deep RL approaches build much of their success on the ability of the deep neural network to generate useful internal representations. Nevertheless, they suffer from a high sample-complexity and starting with a good input representation can have a significant impact on the performance. In this paper, we exploit the fact that the underlying Markov decision process (MDP) represents a graph, which enables us to incorporate the topological information for effective state representation learning. Motivated by the recent success of node representations for several graph analytical tasks we specifically investigate the capability of node representation learning methods to effectively encode the topology of the underlying MDP in Deep RL. To this end we perform a comparative analysis of several models chosen from 4 different classes of representation learning algorithms for policy learning in grid-world navigation tasks, which are representative of a large class of RL problems. We find that all embedding methods outperform the commonly used matrix representation of grid-world environments in all of the studied cases. Moreoever, graph convolution based methods are outperformed by simpler random walk based methods and graph linear autoencoders.

研究动机与目标

探究基于图的状态表征是否能提升深度强化学习中的样本效率和学习性能。
评估不同节点表征学习方法在底层马尔可夫决策过程（MDP）拓扑结构上的有效性。
在网格世界导航环境中，比较图卷积网络、基于随机游走的方法和图自编码器的性能。
确定在MDP的拓扑结构中引入信息是否能带来优于标准矩阵表征的策略学习效果。

提出的方法

作者将网格世界环境建模为图结构，其中状态为节点，转移为边，形成底层MDP图。
应用四类节点表征学习算法：图卷积网络（GCNs）、基于随机游走的方法（如node2vec）、图自编码器和线性自编码器。
使用这些方法学习状态表征，并将其作为深度Q网络（DQN）的输入以进行策略学习。
在相同训练条件下，对每种表征方法在标准网格世界导航任务上的性能进行评估。
将学习到的嵌入表征与标准的独热编码或密集矩阵表征的网格世界状态空间进行对比。
实验采用标准的深度强化学习训练协议，以确保不同表征方法之间的公平比较。

实验结果

研究问题

RQ1基于图的状态表征能否提升深度强化学习中的样本效率和性能？
RQ2在编码MDP拓扑结构用于强化学习时，图卷积网络是否优于更简单的基于随机游走的方法？
RQ3在网格世界导航任务中，图自编码器与其他表征学习方法相比表现如何？
RQ4与标准矩阵表征相比，使用MDP图的拓扑信息是否能带来显著的性能提升？
RQ5在结构化环境中，哪一类表征学习方法能提供最稳健且高效的策略表征？

主要发现

所有基于图的节点表征学习方法在所有评估的网格世界导航任务中均优于标准矩阵表征。
基于随机游走的方法（如node2vec）的性能优于图卷积网络。
图线性自编码器表现出具有竞争力但通常低于基于随机游走方法的性能。
在MDP中引入拓扑结构显著提升了学习效率和最终策略性能。
在此设置中，更简单的表征学习技术比复杂的图神经网络更有效，挑战了GCN优越性的既定假设。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。