QUICK REVIEW

[论文解读] Game-Theoretic Multiagent Reinforcement Learning

Yaodong Yang, Ma, Chengdong|arXiv (Cornell University)|Nov 1, 2020

Reinforcement Learning in Robotics参考文献 398被引用 146

一句话总结

本文从博弈论视角提供了对多智能体强化学习（MARL）的自包含概览，详细介绍了基本概念（随机博弈和扩展形式博弈）并对在各种 MARL 设置中的最新算法进展进行了综述。

ABSTRACT

Tremendous advances have been made in multiagent reinforcement learning (MARL). MARL corresponds to the learning problem in a multiagent system in which multiple agents learn simultaneously. It is an interdisciplinary field of study with a long history that includes game theory, machine learning, stochastic control, psychology, and optimization. Despite great successes in MARL, there is a lack of a self-contained overview of the literature that covers game-theoretic foundations of modern MARL methods and summarizes the recent advances. The majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments on the research frontier. The goal of this monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game-theoretic perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing field and experts in the field who want to obtain a panoramic view and identify new directions based on recent advances.

研究动机与目标

通过随机博弈和扩展形式博弈介绍 MARL 的问题表述。
在 MARL 中解释如纳什均衡以及基于策略/值的方法等解题概念。
调研最近的 MARL 算法发展并将其组织成连贯的分类体系。
讨论 MARL 的重大挑战，包括复杂性、非平稳性和可扩展性。
突出诸如平均场 MARL 和一般-sum 与零和设置之间的对比等现代议题。

提出的方法

给出两个代表性的 MARL 框架：随机博弈和扩展形式博弈。
描述多智能体环境中的基于值和基于策略的 MARL 方法。
讨论纳什均衡作为 MARL 的解概念。
介绍特殊的 SG 类型（单控制者、SR-SIT），并给出可处理性说明。
回顾最近的 MARL 调查以构建方法的分类体系。
覆盖现代议题，如 Q 函数分解、多智能体软学习、平均场 MARL，以及在线 MDPs。

实验结果

研究问题

RQ1用于建模 MARL 的基本博弈论表述有哪些？
RQ2在随机博弈和扩展形式博弈框架下，解决 MARL 的主要算法族有哪些？
RQ3近期进展如何应对 MARL 中的非平稳性、可扩展性以及多目标学习等挑战？
RQ4如何将 MARL 分类为零和、一般和以及平均场设置？以及各类别适用的方法是什么？
RQ5基于当前 MARL 调查，尚存的开放方向与未来研究机会是什么？

主要发现

本文从博弈论出发，提供了对 MARL 的结构化、自包含的处理，连接基本概念与现代方法。
它将随机博弈和扩展形式博弈作为 MARL 的核心表述，并讨论如纳什均衡等解的概念。
它识别并解释了 MARL 的关键挑战，如组合复杂性、非平稳性以及多智能体时的可扩展性。
它对包括基于值、基于策略和 actor-critic 方法在多智能体场景中的广泛算法进行了综述。
它介绍了高级主题（平均场 MARL、随机势函博弈、在线 MDPs）并讨论了它们对未来研究的意义。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。