QUICK REVIEW

[论文解读] Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Jakob Foerster, Yannis Assael|arXiv (Cornell University)|May 21, 2016

Adversarial Robustness in Machine Learning参考文献 24被引用 867

一句话总结

本⽂提出两种方法 RIAL 和 DIAL，用于在集中学习和深度网络支持下，让协作的部分可观测智能体学习通信协议。

ABSTRACT

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

研究动机与目标

研究在部分可观测性下，多个协作智能体如何学习通信以最大化共享奖励。
开发端到端学习方法，使通信协议能在深度神经网络中涌现。
评估集中学习、去中心化执行在训练具通信能力策略中的收益。

提出的方法

提出 Reinforced Inter-Agent Learning (RIAL)，使用带有循环网络的深度Q学习来决定环境动作和离散通信动作。
提出 Differentiable Inter-Agent Learning (DIAL)，在集中学习阶段允许智能体之间传递实值消息，并通过通信通道进行反向传播。
使用参数共享以实现集中学习，同时保持去中心化执行。
在去中心化执行阶段对实值消息进行离散化，以符合任务的通信约束。
通过引入循环网络和基于回合的训练动态来应对部分可观测性。
使用两个基准多智能体任务进行评估，包括 Switch Riddle 和基于 MNIST 的游戏。

实验结果

研究问题

RQ1智能体是否能在部分可观测性下学习出有效的通信协议以解决协作任务？
RQ2在多智能体场景中，具有可微分的智能体间通信（DIAL）是否相对于独立或不可微分方法（RIAL）提供学习优势？
RQ3集中学习、参数共享和信道离散化如何影响通信的涌现？
RQ4在复杂任务中会出现哪些涌现的通信协议，它们的可解释性如何？

主要发现

RIAL 和 DIAL 能在集中学习、分布式执行下解决所提基准任务。
在某些任务中，带参数共享的 DIAL 优于替代方案，且比 RIAL 学习协议更快。
可微分通信带来更丰富的反馈，促使信息设计与协同比非可微方法更有效。
在多智能体设置中，参数共享对于学习通信至关重要。
DIAL 使在学习过程中从连续协议中涌现出可解释、类似离散的通信方案。
通道噪声的存在以及通过 DRU 的正则化影响学习到的通信策略和训练动态。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。