Skip to main content
QUICK REVIEW

[論文レビュー] Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Georgios Papoudakis, Filippos Christianos|arXiv (Cornell University)|Jun 11, 2019
Reinforcement Learning in Robotics参考文献 36被引用数 124
ひとこと要約

The paper surveys how non-stationarity arises in multi-agent deep RL and categorizes methods to mitigate it, including centralized critics, decentralized learning, opponent modeling, meta-learning, and communication, with open problems and future directions.

ABSTRACT

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.

研究の動機と目的

  • Motivate and define non-stationarity in multi-agent DRL and its impact on learning stability.
  • Survey and categorize recent approaches addressing non-stationarity across training architectures and information assumptions.
  • Identify promising directions and open problems for future research in multi-agent non-stationarity.

提案手法

  • Review and categorize existing methods for non-stationarity in multi-agent DRL.
  • Provide taxonomy detailing training/execution architecture, modeling, opponent information, and algorithms.
  • Summarize representative algorithms and their empirical settings in a consolidated table.

実験結果

リサーチクエスチョン

  • RQ1What approaches have been proposed to address non-stationarity in multi-agent deep reinforcement learning?
  • RQ2How do centralized vs decentralized training, opponent modeling, meta-learning, learning representations, and communication contribute to stabilizing learning under non-stationarity?
  • RQ3What are the open problems and future research directions in this area?

主な発見

SettingsTrainingExecutionModelingOpponent InformationAlgorithmNum agents
Tacchetti et al. (2019)MixedCentr.Decentr.ExplicitObs / actionsA2C≥ 2
Singh et al. (2019)MixedDecentr.Decentr.NoNonePG≥ 2
Letcher et al. (2019)MixedDecentr.Decentr.ExplicitParametersPG2
Li et al. (2019)MixedCentr.Decentr.NoObs / actionsDDPG≥ 2
Al-Shedivat et al. (2018)Comp.Decentr.Decentr.NoNonePPO2
Bansal et al. (2018)Comp.Decentr.Decentr.NoNonePPO2
Raileanu et al. (2018)MixedCentr.Centr.ExplicitObs / actionsA3C2
Mordatch and Abbeel (2018)Coop.Decentr.Decentr.NoNonePG≥ 2
Foerster et al. (2018a)MixedDecentr.Decentr.ExplicitParametersPG2
Grover et al. (2018)MixedCentr.Centr.ExplicitObs / actionsPPO / DDPG2
Rabinowitz et al. (2018)MixedCentr.Centr.ExplicitObs / actionsImitation≥ 2
Foerster et al. (2018b)Coop.Centr.Decentr.NoObs / actionsActor-critic≥ 2
Lowe et al. (2017)MixedCentr.Decentr.NoObs / actionsDDPG≥ 2
Foerster et al. (2017)MixedDecentr.Decentr.NoNoneQ-learning≥ 2
Sukhbaatar et al. (2016)Coop.Decentr.Decentr.NoNonePG≥ 2
Foerster et al. (2016a)Coop.Centr.Decentr.NoNoneQ-learning≥ 2
He et al. (2016)MixedCentr.Centr.ImplicitObsQ-learning2
Zhang and Lesser (2010)MixedDecentr.Decentr.ExplicitParametersPG2
  • Centralized critics with decentralized actors stabilize training by conditioning policy gradients on joint observations/actions.
  • Opponent modeling and learning representations can mitigate non-stationarity and improve generalization to diverse opponents.
  • Meta-learning approaches (e.g., MAML-inspired) enable rapid adaptation to non-stationary dynamics.
  • Self-play and stabilized experience replay are effective decentralized strategies under non-stationarity.
  • Communication among agents emerges as a useful mechanism to coordinate policies and stabilize learning in multi-agent settings.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。