QUICK REVIEW

[論文レビュー] A multi-agent reinforcement learning model of common-pool resource appropriation

Julien Pérolat, Joel Z. Leibo|arXiv (Cornell University)|Jul 20, 2017

Experimental Behavioral Economics Studies参考文献 30被引用数 68

ひとこと要約

この論文は、空間的に動的な共用資源ゲームで独立した深層強化学習エージェントを用い、排除、持続可能性、格差を含む出現行動を研究し、経験的ゲーム理論ツールでこれらの結果を分析します。

ABSTRACT

Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria---a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible. Most of that work was based on laboratory experiments where participants only make a single choice: how much to appropriate. Recognizing the importance of spatial and temporal resource dynamics, a recent trend has been toward experiments in more complex real-time video game-like environments. However, standard methods of non-cooperative game theory can no longer be used to generate predictions for this case. Here we show that deep reinforcement learning can be used instead. To that end, we study the emergent behavior of groups of independently learning agents in a partially observed Markov game modeling common-pool resource appropriation. Our experiments highlight the importance of trial-and-error learning in common-pool resource appropriation and shed light on the relationship between exclusion, sustainability, and inequality.

研究の動機と目的

静的ゲーム理論を超えたダイナミックで空間的かつ時系列的に進化する環境としてCPR問題をモデル化する動機づけ。
独立した学習エージェントが共通資源を持続的に適切化するよう自律的に組織化できるかを調査する。
排除メカニズムと領域形成が持続可能性と不平等に与える影響を検討する。
社会的結果を要約する指標を提供し、学習ダイナミクスとゲーム理論的概念を結びつける。

提案手法

エージェントが局所在庫に依存して再生するリンゴを収穫する部分観測可能なN人プレイヤー・マルコフゲームをモデル化する。
集中協調なしで独立したディープQネットワーク（DQN）エージェントを用い、相互作用を通じてポリシーを学習する。
グループ行動を要約する4つの社会的結果指標： Utilitarian (U), Equality (E), Sustainability (S), そして Peace (P) を導入する。
Schelling図を通じてエマージェントなポリシーを分析し、インセンティブを特徴づける実験的ゲーム理論分析を実施する。
他エージェントを資源から排除するタイムアウト tagging 機構を含む変異を検討する。
学習段階を通じたポリシーの観察例／動画の実例を提供する。

実験結果

リサーチクエスチョン

RQ1独立した深層強化学習エージェントは、空間的に動的な環境で共用資源を持続的に適切化する自律組織を形成できるか。
RQ2タグ付けによる排除メカニズムは、持続可能性、平等性、全体効率にどのように影響するか。
RQ3訓練中に出現する社会心理学的な段階（素朴さ、悲劇、成熟）は資源在庫とどのように関係するか。
RQ4Schelling図といった経験的ゲーム理論ツールは、学習エージェント間の戦略的インセンティブの変化をどう特徴づけるか。

主な発見

単一エージェントの学習は、孤立した場合には持続可能なポリシーを生み出すことができる。
多エージェント環境では、グループのリターンが個々の学習進捗を一貫して追跡するとは限らず、社会的指標は個別報酬を超えた段階転換を示す。
3つの訓練段階が出現する：素朴さ（健全な在庫と高効率）、悲劇（急速な枯渇）、成熟（排除ダイナミクスによる在庫維持）。
タイムアウトタグ付けによる排除は、在庫を維持し、タグ付け者の個別リターンを高めるプライベート領域を作り出す一方、エージェント間の不平等を高める。
領域構造と容易な排除は不平等を大きくする傾向がある。複数の入り口を伴うマップや壁のないマップはその不平等を低減する。
Schelling図を用いた経験的ゲーム理論分析は、戦略的インセンティブが時間とともに均一な外部性から有条件的外部性へと変化することを示し、戦略的ダイナミクスの進化を示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。