QUICK REVIEW

[论文解读] LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

Chuanneng Sun, Songjun Huang|arXiv (Cornell University)|May 17, 2024

Reinforcement Learning in Robotics被引用 10

一句话总结

本论文综述了基于大模型的单智能体与多智能体强化学习框架，分析语言如何在多智能体协作与通信中发挥作用，并概述未来研究方向，包括个性、人工在环框架、共同设计及安全性。

ABSTRACT

In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects, such as coordination and communication between agents, are not considered in the RL frameworks of a single agent. To inspire more research on LLM-based MARL, in this letter, we survey the existing LLM-based single-agent and multi-agent RL frameworks and provide potential research directions for future research. In particular, we focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.

研究动机与目标

调查基于LLM的MARL与传统 MARL 框架的现状，以识别优势与差距。
强调语言条件如何有助于合作任务中的代理间通信与协调。
讨论由LLM在MARL中实现的具身应用和人-in-the-loop 场景。
概述尚待解决的研究问题及推动语言条件化多智能体系统发展的潜在方向。

提出的方法

回顾传统的非LLM MARL方法（学习协作与学习通信）及它们的协调/通信范式。
总结基于LLM的单智能体RL以及开环/闭环框架（如 ReAct、Reflexion、ADaPT、Refiner、Retroformer、REX）。
编目现有的基于LLM的MARL框架（表 I），重点关注协作、计划与通信角色。
讨论四个未来研究方向：个性化促进的协作、语言驱动的人-in-/on-the-loop 框架、传统 MARL 与 LLM 的协同设计，以及 MAS 的安全性/安保。
提供关于部署语言条件化 MARL 的问题、机遇与实际考虑的结构化概览。

实验结果

研究问题

RQ1当前基于LLM的MARL框架有哪些，它们如何解决代理之间的协调与通信？
RQ2在语言条件化MARL中存在哪些关键挑战和尚待解决的问题，哪些方向最具潜力（个性、人-in-/on-the-loop、共同设计、安全性）？
RQ3如何以资源高效、具备在板上运行能力的方式将语言模型与传统 MARL 集成（如共同设计与蒸馏）？
RQ4在将LLMs融入MAS时，面向安全与安保的考量有哪些独特之处，以及如何加以缓解？

主要发现

LLMs 使语言条件化的 MARL 成为可能，在协调与可解释性方面有潜在改进。
若干框架（如 DyLAN、FAMA、CoELA、SMART-LLM、RoCo、Co-NavGPT）展示了LLMs在决策、规划与通信中的多样化角色。
该领域尚处起步阶段，在具身任务与非具身任务方面具有巨大的未来潜力。
确定了四个开放方向：个性化促进的协作、人-in-/on-the-loop 框架、传统 MARL 与 LLM 的共同设计，以及 MAS 的安全性/安保。
作者强调需要新的度量标准和体系结构，以有效利用语言实现多智能体协作。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。