QUICK REVIEW

[論文レビュー] A Survey of Deep Reinforcement Learning in Video Games

Kun Shao, Zhentao Tang|arXiv (Cornell University)|Dec 23, 2019

Reinforcement Learning in Robotics参考文献 91被引用数 148

ひとこと要約

本調査はDRLの基盤（価値ベース、ポリシー勾配、モデルベース）を概説し、Atari、ViZDoom、StarCraft、Dota系環境を含む2D/3Dゲーム全般における業績と課題を要約します。

ABSTRACT

Deep reinforcement learning (DRL) has made great achievements since proposed. Generally, DRL agents receive high-dimensional inputs at each step, and make actions according to deep-neural-network-based policies. This learning mechanism updates the policy to maximize the return with an end-to-end method. In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties. Besides, DRL plays an important role in game artificial intelligence (AI). We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from single-agent to multi-agent. A large number of video game AIs with DRL have achieved super-human performance, while there are still some challenges in this domain. Therefore, we also discuss some key points when applying DRL methods to this field, including exploration-exploitation, sample efficiency, generalization and transfer, multi-agent learning, imperfect information, and delayed spare rewards, as well as some research directions.

研究の動機と目的

DRLがゲームAIに有効である理由とその独自の課題に動機づける。
DRL手法（価値ベース、ポリシー勾配、モデルベース）とその主要アルゴリズムを網羅・比較する。
ゲームジャンル（2D/3D、単一エージェント/マルチエージェント）全体でのDRLの成果を調査する。
探索、サンプル効率、一般化、転移、情報が不完全など、実践的な課題を論じる。
ゲームAI領域におけるDRLの適用に向けた今後の方向性を概説する。

提案手法

深層学習と強化学習の背景を説明し、MDP表記を定式化する。
主要なDRL手法ファミリと代表的アルゴリズム（DQN、Rainbow、IQN、SAC、MuZero など）を要約する。
経験リプレイ、ターゲットネットワーク、アクター-クリティック、分布型RL、計画モデルなど、主要なアーキテクチャ的・アルゴリズム的革新を提示する。
ゲームにおけるDRLのトレーニングプラットフォームと評価ベンチマークを論じる。
Atariや他のゲーム領域における性能動向と実証的結果の比較分析を提供する。

実験結果

リサーチクエスチョン

RQ1異なるゲームジャンルや設定において、どのDRL手法がゲームAIとして最も効果的か。
RQ2価値ベース、ポリシー勾配、モデルベースのDRL手法は、サンプル効率、安定性、一般化の点でどのように比較されるか。
RQ3ゲームのためのDRLの進歩を促進する最適なプラットフォーム、ベンチマーク、実験設定は何か。
RQ4単一エージェントおよびマルチエージェントのゲーム環境へDRLを適用する際の主な課題と今後の方向性は何か。

主な発見

手法	平均	中央値	年
DQN	228%	79%	2015
C51	701%	178%	2017
UNREAL	880%	250%	2017
QR-DQN	915%	211%	2017
IQN	1019%	218%	2018
Rainbow	1189%	230%	2018
Ape-X DQN	1695%	434%	2018
Ape-X DQfD	2346%	702%	2018

DRL手法は、2Dおよび3D環境の多くのビデオゲームで超人間レベルの性能を達成します。
Atariベンチマークは劇的な向上を示し、RainbowおよびApe-X系が初期のDQN手法を大幅に上回っています。
分布型およびマルチエージェント拡張（C51、QR-DQN、IQN、Ape-X DQN/DQfD）は、平均・中央値のAtariスコアに大きな改善をもたらします。
モデルベースのアプローチ（MuZero、TreeQN）は、複雑な領域での計画ベースのDRL改善を押し進めます。
一人称視点およびRTS領域は、部分観測性やマルチエージェント協調といった追加の課題を提示しており、メモリ、探索、転移のさらなる進展を促しています。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。