QUICK REVIEW

[論文レビュー] Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Georgios Papoudakis, Filippos Christianos|arXiv (Cornell University)|Jun 11, 2019

Reinforcement Learning in Robotics参考文献 36被引用数 124

ひとこと要約

The paper surveys how non-stationarity arises in multi-agent deep RL and categorizes methods to mitigate it, including centralized critics, decentralized learning, opponent modeling, meta-learning, and communication, with open problems and future directions.

ABSTRACT

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.

研究の動機と目的

Motivate and define non-stationarity in multi-agent DRL and its impact on learning stability.
Survey and categorize recent approaches addressing non-stationarity across training architectures and information assumptions.
Identify promising directions and open problems for future research in multi-agent non-stationarity.

提案手法

Review and categorize existing methods for non-stationarity in multi-agent DRL.
Provide taxonomy detailing training/execution architecture, modeling, opponent information, and algorithms.
Summarize representative algorithms and their empirical settings in a consolidated table.

実験結果

リサーチクエスチョン

RQ1What approaches have been proposed to address non-stationarity in multi-agent deep reinforcement learning?
RQ2How do centralized vs decentralized training, opponent modeling, meta-learning, learning representations, and communication contribute to stabilizing learning under non-stationarity?
RQ3What are the open problems and future research directions in this area?

主な発見

Settings	Training	Execution	Modeling	Opponent Information	Algorithm	Num agents
Tacchetti et al. (2019)	Mixed	Centr.	Decentr.	Explicit	Obs / actions	A2C	≥ 2
Singh et al. (2019)	Mixed	Decentr.	Decentr.	No	None	PG	≥ 2
Letcher et al. (2019)	Mixed	Decentr.	Decentr.	Explicit	Parameters	PG	2
Li et al. (2019)	Mixed	Centr.	Decentr.	No	Obs / actions	DDPG	≥ 2
Al-Shedivat et al. (2018)	Comp.	Decentr.	Decentr.	No	None	PPO	2
Bansal et al. (2018)	Comp.	Decentr.	Decentr.	No	None	PPO	2
Raileanu et al. (2018)	Mixed	Centr.	Centr.	Explicit	Obs / actions	A3C	2
Mordatch and Abbeel (2018)	Coop.	Decentr.	Decentr.	No	None	PG	≥ 2
Foerster et al. (2018a)	Mixed	Decentr.	Decentr.	Explicit	Parameters	PG	2
Grover et al. (2018)	Mixed	Centr.	Centr.	Explicit	Obs / actions	PPO / DDPG	2
Rabinowitz et al. (2018)	Mixed	Centr.	Centr.	Explicit	Obs / actions	Imitation	≥ 2
Foerster et al. (2018b)	Coop.	Centr.	Decentr.	No	Obs / actions	Actor-critic	≥ 2
Lowe et al. (2017)	Mixed	Centr.	Decentr.	No	Obs / actions	DDPG	≥ 2
Foerster et al. (2017)	Mixed	Decentr.	Decentr.	No	None	Q-learning	≥ 2
Sukhbaatar et al. (2016)	Coop.	Decentr.	Decentr.	No	None	PG	≥ 2
Foerster et al. (2016a)	Coop.	Centr.	Decentr.	No	None	Q-learning	≥ 2
He et al. (2016)	Mixed	Centr.	Centr.	Implicit	Obs	Q-learning	2
Zhang and Lesser (2010)	Mixed	Decentr.	Decentr.	Explicit	Parameters	PG	2

Centralized critics with decentralized actors stabilize training by conditioning policy gradients on joint observations/actions.
Opponent modeling and learning representations can mitigate non-stationarity and improve generalization to diverse opponents.
Meta-learning approaches (e.g., MAML-inspired) enable rapid adaptation to non-stationary dynamics.
Self-play and stabilized experience replay are effective decentralized strategies under non-stationarity.
Communication among agents emerges as a useful mechanism to coordinate policies and stabilize learning in multi-agent settings.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。