QUICK REVIEW

[论文解读] Multi-agent Reinforcement Learning: A Comprehensive Survey

Dom Huh, Prasant Mohapatra|arXiv (Cornell University)|Dec 15, 2023

Reinforcement Learning in Robotics被引用 8

一句话总结

对多智能体强化学习（MARL）的综合综述，涵盖多智能体系统（MAS）基础、博弈论概念、深度学习整合、学习动力学以及未解决的挑战。

ABSTRACT

Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications, where multiple agents must make decisions to achieve their objectives in a shared environment. Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation. This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML) and connecting them to recent advancements in multi-agent reinforcement learning (MARL), i.e. the research of data-driven decision-making within MAS. Therefore, the objective of this survey is to provide a comprehensive perspective along the various dimensions of MARL, shedding light on the unique opportunities that are presented in MARL applications while highlighting the inherent challenges that accompany this potential. Therefore, we hope that our work will not only contribute to the field by analyzing the current landscape of MARL but also motivate future directions with insights for deeper integration of concepts from related domains of GT and ML. With this in mind, this work delves into a detailed exploration of recent and past efforts of MARL and its related fields and describes prior solutions that were proposed and their limitations, as well as their applications.

研究动机与目标

定义多智能体系统（MAS）与 MARL，并在共享环境中推动学习控制的动机。
通过将博弈论与机器学习的视角整合到 MARL，提出统一的观点。
调查 MAS 中代理人的基础模型（例如，随机博弈、POSG）和学习目标。
讨论深度学习与 RL 的基础如何应用于 MARL，包括基于价值、基于策略以及基于模型的方法。
突出 MARL 研究的挑战、范式和未来方向。

提出的方法

在随机博弈和 POSG 内将 MARL 公式化，以捕捉信息不完全性和联合策略。
调查博弈论的解概念（Nash 均衡、CE、CCE、帕累托效率）及其对 MARL 的含义。
解释学习动力学（最优应答、无后悔）及其与 MARL 收敛性的相关性。
回顾深度学习在 MARL 中的整合，包括价值函数近似、策略梯度、actor-critic，以及基于模型的方法。
讨论强化学习基础（Q-learning、策略梯度、actor-critic、基于模型的规划）及其在 MAS 的适应。
概述 MARL 的特定方面，如仿真、通信、即席团队合作、知识迁移和代理建模。

Figure 1 : A visualization of a multi-agent control system, inspired by [ Albrecht et al. , 2024 ] .

实验结果

研究问题

RQ1在多智能体环境中学习控制的基本问题表述是什么？
RQ2博弈论与深度学习如何帮助理解和解决 MARL 问题？
RQ3适用于 MARL 的主要学习范式与算法有哪些（基于价值、基于策略、actor-critic、基于模型）？
RQ4在 MARL 中会出现哪些挑战（非平稳性、部分可观测性、通信、团队形成、知识迁移），又如何解决？
RQ5MARL 研究有哪些未解的问题和未来方向？

主要发现

MARL 基于随机博弈和 POSG 来建模不确定性和部分可观测性下的多智能体交互。
均衡概念（NE、CE/CCE、帕累托效率）及其学习动力学为 MARL 的稳定性与收敛性提供理论透镜。
深度学习实现了可扩展的端对端 MARL 解决方案，但需要大量数据和计算资源。
RL 基础（基于价值、基于策略、actor-critic、基于模型）通过专门技术（经验回放、目标网络、探索、离线学习）适应于 MARL。
基于模型的 MARL 引入带有学习到的转移动力学和不确定性估计的规划，以补充无模型方法。
该综述识别了 MARL 特定挑战（通信、即席团队协作、知识迁移、代理建模）并勾画了未來研究方向。

Figure 2 : Models of Games: The overview of different models of multi-agent interactions is illustrated, from Markov Decision Processes (MDP) to variations of stochastic games. The following figure was adapted and updated from [ Albrecht et al. , 2024 ] .

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。