QUICK REVIEW

[论文解读] A Survey of Progress on Cooperative Multi-agent Reinforcement Learning in Open Environment

Lei Yuan, Ziqian Zhang|arXiv (Cornell University)|Dec 2, 2023

Reinforcement Learning in Robotics被引用 9

一句话总结

本文综述了合作型多智能体强化学习（MARL）的最新进展，强调开放环境设置、关键方法、测试环境和未来研究方向。

ABSTRACT

Multi-agent Reinforcement Learning (MARL) has gained wide attention in recent years and has made progress in various fields. Specifically, cooperative MARL focuses on training a team of agents to cooperatively achieve tasks that are difficult for a single agent to handle. It has shown great potential in applications such as path planning, autonomous driving, active voltage control, and dynamic algorithm configuration. One of the research focuses in the field of cooperative MARL is how to improve the coordination efficiency of the system, while research work has mainly been conducted in simple, static, and closed environment settings. To promote the application of artificial intelligence in real-world, some research has begun to explore multi-agent coordination in open environments. These works have made progress in exploring and researching the environments where important factors might change. However, the mainstream work still lacks a comprehensive review of the research direction. In this paper, starting from the concept of reinforcement learning, we subsequently introduce multi-agent systems (MAS), cooperative MARL, typical methods, and test environments. Then, we summarize the research work of cooperative MARL from closed to open environments, extract multiple research directions, and introduce typical works. Finally, we summarize the strengths and weaknesses of the current research, and look forward to the future development direction and research problems in cooperative MARL in open environments.

研究动机与目标

介绍与合作 MARL 相关的强化学习与多智能体系统的基础概念。
评估合作 MARL 方法及其在经典封闭环境中的表现。
概述开放环境中的合作 MARL 新兴研究工作，并确定研究方向与挑战。

提出的方法

回顾并对核心 MARL 方法（策略梯度、基于值的方法和 actor-critic）及其扩展进行分类。
解释开放环境中的挑战，包括非平稳性、信用分配以及 MAS 的扩展性。
总结用于合作 MARL 的测试床、环境和应用领域。
勾勒开放环境中的未来方向和潜在研究问题。

实验结果

研究问题

RQ1主流的合作 MARL 方法有哪些？它们如何解决协作与信用分配？
RQ2开放环境因素如何影响 MARL？现有的工作如何应对这些挑战？
RQ3开放环境中合作 MARL 的常见测试环境和应用领域有哪些？
RQ4为使合作 MARL 在现实世界开放场景中落地，存在哪些空白点与未来方向？

主要发现

本综述识别出常见的 MARL 方法（如 MADDPG、MAPPO、VDN、QMIX）及其在合作场景中的作用。
开放环境 MARL 研究仍然有限，正在继续探索鲁棒性、持续学习、泛化以及仿真到现实的迁移。
研究聚焦于通信、离线策略部署、世界模型和训练范式以提升协作。
本文讨论开放环境概念，如 Ad-Hoc Teamwork、Zero-Shot Coordination 和 Few-Shot Teamwork，作为感兴趣的设定。
它概述了强化学习、MAS、博弈论和随机博弈中的标准背景，以支撑 MARL 的发展。

Figure 2: Illustration of reinforcement learning.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。