QUICK REVIEW

[论文解读] Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Georgios Papoudakis, Filippos Christianos|arXiv (Cornell University)|Jun 11, 2019

Reinforcement Learning in Robotics参考文献 36被引用 124

一句话总结

该论文综述了多主体深度强化学习中非平稳性如何产生，并对缓解方法进行分类，包括集中式评论家、去中心化学习、对手建模、元学习和通信，以及存在的开放问题与未来方向。

ABSTRACT

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.

研究动机与目标

激发并定义多智能体深度强化学习中的非平稳性及其对学习稳定性的影响。
综述并对最近在训练架构和信息假设方面解决非平稳性的方法进行分类。
识别多智能体非平稳性领域未来研究的有前景方向与待解决的问题。

提出的方法

回顾并对多智能体深度强化学习中的非平稳性现有方法进行分类。
提供一个分门别类的分类法，详细说明训练/执行架构、建模、对手信息以及算法。
在一个整合表中总结代表性算法及其经验设定。

实验结果

研究问题

RQ1已经提出哪些方法来解决多智能体深度强化学习中的非平稳性？
RQ2集中式与去中心化训练、对手建模、元学习、学习表征以及通信在非平稳性下如何帮助稳定学习？
RQ3该领域存在哪些未解决的问题和未来研究方向？

主要发现

集中式评论家与去中心化参与者通过将策略梯度条件化在联合观测/动作上来稳定训练。
对手建模和学习表征可以缓解非平稳性并提高对不同对手的泛化能力。
元学习方法（如受MAML启发）使对非平稳动态实现快速适应。
自我博弈和稳定化经验回放是在非平稳性下有效的去中心化策略。
代理之间的通信成为在多代理设置中协调策略和稳定学习的有用机制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。