QUICK REVIEW

[論文レビュー] Coordinated Multi-Agent Imitation Learning

Hoang Le, Yisong Yue|arXiv (Cornell University)|Mar 9, 2017

Reinforcement Learning in Robotics参考文献 25被引用数 60

ひとこと要約

本論文は、複数エージェントの模倣学習を共同で行う潜在的協調モデルと個別ポリシーを半教師付きフレームワークで学習し、役割割り当てを推定し模倣損失を改善するために交互最適化を用いる。

ABSTRACT

We study the problem of imitation learning from demonstrations of multiple coordinating agents. One key challenge in this setting is that learning a good model of coordination can be difficult, since coordination is often implicit in the demonstrations and must be inferred as a latent variable. We propose a joint approach that simultaneously learns a latent coordination model along with the individual policies. In particular, our method integrates unsupervised structure learning with conventional imitation learning. We illustrate the power of our approach on a difficult problem of learning multiple policies for fine-grained behavior modeling in team sports, where different players occupy different roles in the coordinated team strategy. We show that having a coordination model to infer the roles of players yields substantially improved imitation loss compared to conventional baselines.

研究の動機と目的

Motivate imitation learning for multiple coordinating agents where coordination is implicit and roles are unobserved.
Propose a semi-supervised framework that combines structured latent coordination learning with conventional imitation learning.
Develop an alternating optimization scheme to train both the latent structure model and the individual policies effectively.
Demonstrate the approach on synthetic (predator-prey) and real-world-like (professional soccer) multi-agent tasks to show improved imitation performance.

提案手法

協調模倣を、デモンストレーション全体でエージェントに役割を割り当てる潜在的協調モデルと複数の分散型ポリシーを学習することとして定式化する。
協調構造を符号化するグラフィカルモデル q と、学習済みの役割に合わせて軌跡を再インデックス化する役割割り当て A を用いる。
黒箱予測器（例：ディープネットワーク、Random Forest など）を用いることを可能にする、多-agent ポリシーに対するリダクション型模倣学習アプローチを採用する。
協調構造 q(θ,z) を学習するために確率的変分推論を用い、潜在的役割系列 z を隠れマルコフ過程としてモデル化する。
潜在モデルと軌跡尤度から導出されたコスト行列を用いて、線形割り当て問題（Kuhn–Munkres）により役割割り当てを解く。
交互に訓練する（Algorithm 1）: (i) 構造を固定してポリシーを学習する（Algorithm 2）、および (ii) 潜在構造と役割割り当てを更新する（Algorithm 3/Algorithm 4）。
役割割り当てにエントロピー正則化を組み込み、インデックス付けの有用な多様化を促進する（H(A|D) を最大化）。

実験結果

リサーチクエスチョン

RQ1複数エージェントのデモンストレーションにおける観測されない役割を推定するため、潜在協調モデルをポリシーと共に学習できるか。
RQ2構造化された役割割り当てを組み込むことは、構造化されていないマルチエージェント模倣学習と比較して模倣損失を改善するか。
RQ3マルチエージェント模倣における非定常性と共変量シフトに対処する際、交互最適化フレームワークはどれくらい効果的か。
RQ4合成系（捕食者-被捕食者）と実世界に近い系（サッカー）ドメインで、協調的な役割割り当てが性能に与える影響は何か。

主な発見

協調的手法は、合成域とサッカー域の両方でベースラインより著しく良い模倣性能をもたらす。
潜在構造モデルによる役割推定は、ポリシー学習のための状態表現をより一貫性のあるものにし、協調を改善する。
潜在的な役割を介して協調を学習することは、多数のエージェントを含む大規模設定（例：長い軌跡を持つサッカーの守備）へと拡張可能であることを示す。
協調した役割割り当てで訓練された分散ポリシーは、協調を学習した場合、中央集権的ポリシーと競合するか同等の性能を達成する。
The approach is the first to apply imitation learning to jointly learn cooperative multi-agent policies at large scale in the presented settings.
The coordination structure learned (HMM components) reveals dominant modes corresponding to common team formations and role transitions during play.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。