QUICK REVIEW

[論文レビュー] A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Pablo Hernández-Leal, Michael Kaisers|arXiv (Cornell University)|Jul 28, 2017

Advanced Bandit Algorithms Research参考文献 156被引用数 198

ひとこと要約

この調査は、多資人環境での学習が非定常性にどのように対処するかをレビューし、アプローチを分類する5カテゴリーの枠組みを導入する。

ABSTRACT

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

研究の動機と目的

バンディット、強化学習、ゲーム理論において、対戦相手によって生じる非定常性の扱いを総合的に整理する。
マルチエージェント学習における非定常性の扱いを分類する一貫した枠組みを導入する。
環境と対戦相手の適応要因を用いて最先端アルゴリズムを分類する。
非定常なマルチエージェント学習における長所と限界、今後の研究方向について論じる。

提案手法

非定常性を枠組みづけるために、マルチアームドバンディット、強化学習、ゲーム理論の形式的モデルをレビューする。
非定常性の取り扱いを5つのカテゴリーで定義する新しい枠組みを提案する：ignore, forget, respond to target models, learn models, theory of mind.
各カテゴリーを領域の例で示し、長所と短所を浮き彫りにする。
カテゴリーと環境/適応特性に基づくアルゴリズムの分類体系を提供する。
未解決の問題と今後の研究方向について論じる。

実験結果

リサーチクエスチョン

RQ1異なるドメイン（バンディット、RL、ゲーム理論）において、マルチエージェント学習で非定常性はどのように生じるのか？
RQ2非定常性の扱いの高度化の進展を最もよく捉える枠組みは何か？
RQ3観測可能性と対戦相手の適応仮定が異なる場合、どのアルゴリズムがどのカテゴリーと対応するか？
RQ4非定常なマルチエージェント学習における主要な未解決問題と、有望な今後の研究経路は何か。

主な発見

非定常性へ対処する五カテゴリーの枠組みを提案する：ignore, forget, respond to target opponents, learn opponent models, and theory of mind.
MABs、RL、ゲーム理論からの最先端アルゴリズムを、カテゴリーと環境/opp. adaptationで分類する分類法を提供する。
各カテゴリーの長所と短所を対比するための例示的な変化を用いる。
異なるアプローチが最も効果を発揮する環境を分析し、有望な今後の研究方向を概説する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。