QUICK REVIEW

[论文解读] Learning Efficient Multi-agent Communication: An Information Bottleneck Approach

Rundong Wang, Xu He|arXiv (Cornell University)|Nov 16, 2019

Reinforcement Learning in Robotics参考文献 21被引用 38

一句话总结

IMAC 学习信息丰富、低熵的通信协议和基于权重的调度器，使用信息瓶颈来解决多智能体强化学习中的带宽受限问题，达到比基线更快的收敛速度和更高效的通信。

ABSTRACT

We consider the problem of the limited-bandwidth communication for multi-agent reinforcement learning, where agents cooperate with the assistance of a communication protocol and a scheduler. The protocol and scheduler jointly determine which agent is communicating what message and to whom. Under the limited bandwidth constraint, a communication protocol is required to generate informative messages. Meanwhile, an unnecessary communication connection should not be established because it occupies limited resources in vain. In this paper, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols as well as scheduling. First, from the perspective of communication theory, we prove that the limited bandwidth constraint requires low-entropy messages throughout the transmission. Then inspired by the information bottleneck principle, we learn a valuable and compact communication protocol and a weight-based scheduler. To demonstrate the efficiency of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks with different numbers of agents and different bandwidths. We show that IMAC converges faster and leads to efficient communication among agents under the limited bandwidth as compared to many baseline methods.

研究动机与目标

在协作型 MARL 中激发并形式化有限带宽问题。
开发一种方法来学习信息丰富、低熵的通信协议。
引入在信息理论正则化下学习的基于权重的调度器。
在协作与对抗任务中展示更快的收敛和更高的效率。

提出的方法

将信息视为连续随机向量，并通过源编码和奈奎斯特原则将带宽与消息熵联系起来。
应用变分信息瓶颈来正则化输入与信息之间的互信息：I(H_i; M_i) ≤ I_c，并在压缩目标下最大化 Q-函数。
使用基于 KL 散度的上界与高斯先验 z(m_i) 实现 IB 正则化，在可处理的优化目标中。
将调度器视为虚拟代理，并应用相同的IB正则化以学习基于权重的调度机制。
在执行阶段实现类似批归一化的层，以强制低熵消息并模拟带宽约束。
在集中训练/去中心化执行框架下，实现通信协议、智能体策略和调度器的联合训练。

实验结果

研究问题

RQ1有限带宽如何约束 MARL 中传输信息的熵？
RQ2信息瓶颈正则化是否能够产生信息丰富、低熵的通信协议，在带宽约束下提升学习？
RQ3在相同信息理论原理下，调度是否可以与协议学习统一？
RQ4基于 IB 的 IMAC 方法是否在不同代理数量和带宽下的协作与对抗 MARL 任务中提升性能和收敛？

主要发现

IMAC 学习低熵信息，在有限带宽下比基线实现更快的收敛。
在协作任务（协作导航、捕猎-对手）和 StarCraft II 场景中，IMAC 一直优于 TarMAC、GACML、SchedNet 和带通信的 MADDPG。
IMAC 能扩展到更多智能体（例如 5、10），同时保持优越的性能和更快的学习曲线。
基于IB的正则化在执行阶段对带宽水平变化具有鲁棒性，在带宽约束下优于非压缩通信基线。
IB 先验 z(m_i) 的选择以及压缩强度 beta 对性能有关键影响，中等压缩取得最佳结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。