QUICK REVIEW

[論文レビュー] Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

Kaiqing Zhang, Zhuoran Yang|arXiv (Cornell University)|Feb 23, 2018

Distributed Control Multi-Agent Systems参考文献 71被引用数 250

ひとこと要約

本論文は、時変ネットワーク上で完全分散型のマルチエージェント強化学習を実現するための2つの分散型 Actor-Critic アルゴリズムを提案する。コンセンサスに基づくクリティックと関数近似、線形近似下の収束保証を提供。

ABSTRACT

We consider the problem of \emph{fully decentralized} multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large. Under the decentralized structure, the actor step is performed individually by each agent with no need to infer the policies of others. For the critic step, we propose a consensus update via communication over the network. Our algorithms are fully incremental and can be implemented in an online fashion. Convergence analyses of the algorithms are provided when the value functions are approximated within the class of linear functions. Extensive simulation results with both linear and nonlinear function approximations are presented to validate the proposed algorithms. Our work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees.

研究の動機と目的

時変ネットワーク上でエージェントが局所報酬と隣接ノードとの通信のみを用いてグローバル平均リターンを最大化することを目的とした、完全に分散された MARL 設定を動機づけ、形式化する。
セントラルコントローラを用いずに機能近似を用いた2つの分散型 Actor-Critic アルゴリズムを提案する。
ローカルポリシーとコンセンサスベースの値推定を用いて、大規模な状態空間とエージェント空間への適用をスケーラブルにする。
提案アルゴリズムの線形関数近似下での理論的収束保証を確立する。
理論を補完するため、シミュレーションを通じた実証的検証を示し、理論をサポートする。

提案手法

時変通信グラフと局所報酬を持つネットワーク化されたマルチエージェント MDP を定式化する。
エージェントごとに分解され、局所ポリシーを使用する MARL のポリシー勾配定理を導出する。
アクター更新が局所的で、クリティック更新が隣接ノード間でコンセンサスに基づく、2つの分散型 Actor-Critic アルゴリズムを提案する。
局所パラメータを用いた Q および V の関数近似と、ネットワーク全体で推定を共有するためのコンセンサスステップを用いる。
TDエラーの状態価値版または行動価値版のオプションを備えた2つのオンラインで逐次更新されるアルゴリズムを提供する。
線形関数近似下での収束保証を確立し、コンセンサス更新を分析する。）

実験結果

リサーチクエスチョン

RQ1局所報酬と中央コントローラなしで、ネットワーク化エージェントに対して完全に分散された MARL をどのように定式化できるか？
RQ2時変ネットワークトポロジー下で関数近似を用いた場合、分散型 Actor-Critic アルゴリズムは収束し得るか？
RQ3ネットワーク全体の最適性を達成するために、コンセンサス更新はどのような役割を果たすか？
RQ4提案フレームワークにおいて、線形関数近似は収束保証にどのような影響を及ぼすか？
RQ5提案アルゴリズムはオンライン動作を維持しつつ、多数のエージェントおよび高次元の状態-行動空間へスケールできるか？

主な発見

時変グラフを持つネットワーク化 MARL のために、機能近似を用いた2つの分散型 Actor-Critic アルゴリズムを提案。
局所的な Actor 更新を可能にし、コンセンサスベースのクリティック推定を可能にする MARL のポリシー勾配定理を確立。
両アルゴリズムに対する線形関数近似の場合の収束保証を証明。
アルゴリズムは完全に逐次的でオンライン実装が可能であり、個別報酬の伝送を避けることでエージェントのプライバシーを保つ。
線形および非線形の関数近似を用いた実証的シミュレーションは、提案手法を検証し理論を支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。