QUICK REVIEW

[論文レビュー] Multi-Agent Common Knowledge Reinforcement Learning

Christian A. Schroeder de Witt, Jakob Foerster|arXiv (Cornell University)|Oct 27, 2018

Reinforcement Learning in Robotics参考文献 63被引用数 51

ひとこと要約

この論文は共通知識を活用する階層的で完全分散型のポリシー学習フレームワーク MACKRL を紹介し、集中実行を伴わずに協調的なマルチエージェント制御を達成します。マトリクスゲームと StarCraft II のミクロマネジメント課題において、独立学習および結合行動ベースラインを上回っています。

ABSTRACT

Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others' observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.

研究の動機と目的

分散型協力型マルチエージェント強化学習における協調信号としての共通知識の利用を動機づけ、形式化する。
共通知識を条件として階層的ポリシーツリーを学習する集中訓練・分散実行アルゴリズム（MACKRL）を開発する。
共通知識による協調が複雑なタスクで優れた性能を示しつつ分散実行を保つことを実証する。

提案手法

共通知識によってグループを調整する階層的ポリシー・ツリーを用いる確率的アクタークリティックアルゴリズムを提案する。
グループ G の結合行動に対するポリシーを、共通知識を用いてより大きなグループを調整する上位レベルを持つサブポリシーの木（pi^G(u^G | I^G(t), xi)）を横断して定義する。
グローバルな協調と局所的制御のトレードオフを可能にするサブグループへの代替委任を許容する。
Pairwise MACKRL をスケーラブルな3レベル階層（ペアセレクター、ペアコントローラ、個別コントローラ）として実装する。ペアコントローラ間でパラメータを共有し、サンプル効率を向上させる。
中核的な critic（Central-V スタイル）を TD(λ) で訓練し、微分可能なエンドツーエンド訓練 regime で階層的結合ポリシーを更新する。

実験結果

リサーチクエスチョン

RQ1共通知識を用いた協調は独立学習が苦戦する分散ポリシーで効果的な協調を可能にするか。
RQ2共通知識を階層的に条件付けることで完全な結合行動の協調と独立実行の間で MACKRL はどのようにトレードオフするか。
RQ3協調タスクと大規模ベンチマークにおける IL、CK-JAL、JAL と比較したときの MACKRL の性能上の利点は何か。
RQ4共通知識ベースの協調は観測ノイズやより多くのエージェントへのスケーリングにどの程度頑健か。

主な発見

MACKRL は二エージェントのマトリクスゲームで Independent Actor Critic (IAC) および CK-JAL を上回り、共通知識が増えると JAL の性能に近づきつつも分散実行を維持する。
確率的共通知識設定では、MACKRL はエージェントの共通知識に関する信念を用いて動作でき、観測ノイズ下で協調ポリシーが健全に低下する。
StarCraft II ミクロマネジメントベンチマーク（SMAC）で、MACKRL は Central-V、COMA、QMIX をサンプル効率で上回り、漸近的な性能も競争力がある。
ペアワイズ MACKRL はエージェント数が異なる複数のマップへスケールする能力を示し、2s3z、3m、8m などのマップでベースラインより協調を改善した。
ペア分割集合をサブサンプリングでスケーリングすると協調カバー率は緩やかに低下するが、依然として高い性能を達成することを示し、分割の可用性に対する頑健性を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。