QUICK REVIEW

[論文レビュー] Playing FPS Games with Deep Reinforcement Learning

Guillaume Lample, Devendra Singh Chaplot|arXiv (Cornell University)|Sep 18, 2016

Reinforcement Learning in Robotics被引用数 111

ひとこと要約

著者らは ViZDoom の 3D FPS deathmatch に対して DRQN ベースのエージェントを開発し、ゲーム特徴訓練とナビゲーション-アクション分割を強化し、超人レベルの性能とより速い訓練を達成している。

ABSTRACT

Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios.

研究の動機と目的

3D FPS 環境における部分観測性に対処するため、リカレントネットワークを用いる。
ゲーム特徴の強化を通じて訓練効率と性能を向上させる。
モジュール化されたネットワークによるナビゲーションとアクションのフェーズ分割で学習速度を向上させる。
未知のマップへ一般化し、人間プレイヤーおよび組み込みボットと比較する。

提案手法

DRQN アーキテクチャを基盤とし、2つのビジュアルストリームを用意：CNN の出力は LSTM と補助的特徴ヘッドの両方に feeding される。
訓練時に敵の出現やアイテムの有無などの binary なゲーム特徴指標を入力に追加し、畳み込みフィルタを導く。
2フェーズのアーキテクチャを導入：探索用のナビゲーションネットワーク（DQN）と戦闘用の特徴付き DRQN であるアクションネットワーク；敵の有無によりフェーズを決定。
ゲーム特徴を Q 学習の目的関数と共訓練し、特徴検出が方策学習を情報化するようにする。
報酬が sparse/遅延するのを緩和するため報酬設計を適用し、訓練を高速化するためフレームスキッピングを用いる。
学習を安定させるため、最小履歴を持つ逐次的 DRQN 更新を行う。

実験結果

リサーチクエスチョン

RQ1DRQN ベースのエージェントは部分的に観測される 3D FPS 環境で効果的な方策を学習できるのか。
RQ2訓練時にゲームエンジンの特徴を取り入れる（テスト時には利用できない場合がある）ことで、学習を加速し性能を向上させられるのか。
RQ3ナビゲーションとアクションを分割して解決するアーキテクチャは、単一のモノリシックなネットワークと比較して訓練効率と最終性能を改善するのか。
RQ4未知のマップへの一般化能力はどの程度か、また人間プレイヤーと組み込みボットとどの程度比較されるのか。

主な発見

ゲーム特徴を追加した augmented DRQN は、デスマッチタスクにおけるベースライン DRQN より顕著に性能を向上させる。
ナビゲーションを意識したモジュール化は単一ネットワークより良い結果を生み出し、キャムパー行動を抑制し地図探索を改善する。
ゲーム特徴と共訓練により、数時間の訓練後の敵検出精度が約 90% に達し、学習を加速する。
ViZDoom の deathmatches で、エージェントは組み込み Doom ボットおよび人間プレイヤーを上回る（Single Player: Human 1.52 vs Agent 5.12; Multiplayer: Human 0.49 vs Agent 1.33 in K/D ratio）。
ナビゲーションを使用した場合、オブジェクト回収と K/D 比が高くなる（例：武器/アクセサリの取得による完全 deathmatch でより大きな改善）。
ゲーム特徴を用いた最大境界では K/D 比が 4.0 を超え、未知マップへの一般化をサポートする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。