QUICK REVIEW

[論文レビュー] Learning Efficient Multi-agent Communication: An Information Bottleneck Approach

Rundong Wang, Xu He|arXiv (Cornell University)|Nov 16, 2019

Reinforcement Learning in Robotics参考文献 21被引用数 38

ひとこと要約

IMAC は情報ボトルネックを用いて、限られた帯域幅のマルチエージェント強化学習に対処するための情報性が高くエントロピーの低い通信プロトコルとウェイトベースのスケジューラを学習し、ベースラインよりも収束を早め、通信効率を向上させる。

ABSTRACT

We consider the problem of the limited-bandwidth communication for multi-agent reinforcement learning, where agents cooperate with the assistance of a communication protocol and a scheduler. The protocol and scheduler jointly determine which agent is communicating what message and to whom. Under the limited bandwidth constraint, a communication protocol is required to generate informative messages. Meanwhile, an unnecessary communication connection should not be established because it occupies limited resources in vain. In this paper, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols as well as scheduling. First, from the perspective of communication theory, we prove that the limited bandwidth constraint requires low-entropy messages throughout the transmission. Then inspired by the information bottleneck principle, we learn a valuable and compact communication protocol and a weight-based scheduler. To demonstrate the efficiency of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks with different numbers of agents and different bandwidths. We show that IMAC converges faster and leads to efficient communication among agents under the limited bandwidth as compared to many baseline methods.

研究の動機と目的

協調的な MARL における限られた帯域幅の問題を動機づけ、形式化する。
情報性が高くエントロピーの低い通信プロトコルを学習する方法を開発する。
情報理論に基づく正則化の下で学習されるウェイトベースのスケジューラを導入する。
協調的および競合的タスクを通じて収束と効率の改善を実証する。

提案手法

メッセージを連続的な確率ベクトルとしてモデル化し、ソース符号化とナイキスト原理を用いて帯域幅とメッセージエントロピーを関連付ける。
入力とメッセージ間の相互情報を正則化するために変分情報ボトルネックを適用する： I(H_i; M_i) ≤ I_c かつ圧縮された目的でQ関数を最大化する。
ガウス事前分布 z(m_i) を用いたKL発散に基づく上限を使用して、IB正則化を扱いやすい最適化目的に実装する。
スケジューラを仮想エージェントとして扱い、同じIB正則化を適用してウェイトベースのスケジューリング機構を学習する。
実行時にバッチ正規化様の層を実装して低エントロピーのメッセージを強制し、帯域制約を模倣する。
集中化トレーニング/分散実行フレームワークの下で通信プロトコル、エージェント方針、スケジューラの結合トレーニングを可能にする。

実験結果

リサーチクエスチョン

RQ1限られた帯域幅は MARL における伝送メッセージのエントロピーをどのように制約するのか？
RQ2情報ボトルネック正則化は、帯域幅制約の下で学習を改善する情報性が高くエントロピーの低い通信プロトコルを生み出せるのか？
RQ3同じ情報理論的原理の下でスケジューリングをプロトコル学習と統合できるのか？
RQ4IB ベースの IMAC アプローチは、エージェント数や帯域幅が異なる協調および競合的 MARL タスク全体で性能と収束を改善するのか？

主な発見

IMAC は低エントロピーなメッセージを学習し、限られた帯域幅の下で基準よりも速い収束を達成する。
協調タスク（協調ナビゲーション、捕食者-被食者）および StarCraft II のシナリオで、IMAC は TarMAC、GACML、SchedNet、MADDPG with communication を一貫して上回る。
IMAC はより多くのエージェント（例: 5、10）にスケールしつつ、優れた性能とより速い学習曲線を維持する。
IBベースの正則化は実行時の帯域幅レベルの変動に対するロバスト性を提供し、帯域制約下で非圧縮通信のベースラインを上回る。
IB prior z(m_i) の事前選択と圧縮強度 beta の選択は性能に重大な影響を及ぼし、中程度の圧縮が最良の結果をもたらす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。