QUICK REVIEW

[論文レビュー] TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

Peng Sun, Xinghai Sun|arXiv (Cornell University)|Sep 19, 2018

Reinforcement Learning in Robotics参考文献 8被引用数 54

ひとこと要約

この論文は、2つのフルゲームのStarCraft IIエージェント、TStarBot1（マクロアクションベースDRL）とTStarBot2（階層的マクロ-マイクロ、ルール付き）を提示し、1v1のZerg-vs-Zergのフルゲームで1〜10レベルの組み込みチーティングAIを打ち負かす。

ABSTRACT

Starcraft II (SC2) is widely considered as the most challenging Real Time Strategy (RTS) game. The underlying challenges include a large observation space, a huge (continuous and infinite) action space, partial observations, simultaneous move for all players, and long horizon delayed rewards for local decisions. To push the frontier of AI research, Deepmind and Blizzard jointly developed the StarCraft II Learning Environment (SC2LE) as a testbench of complex decision making systems. SC2LE provides a few mini games such as MoveToBeacon, CollectMineralShards, and DefeatRoaches, where some AI agents have achieved the performance level of human professional players. However, for full games, the current AI agents are still far from achieving human professional level performance. To bridge this gap, we present two full game AI agents in this paper - the AI agent TStarBot1 is based on deep reinforcement learning over a flat action structure, and the AI agent TStarBot2 is based on hard-coded rules over a hierarchical action structure. Both TStarBot1 and TStarBot2 are able to defeat the built-in AI agents from level 1 to level 10 in a full game (1v1 Zerg-vs-Zerg game on the AbyssalReef map), noting that level 8, level 9, and level 10 are cheating agents with unfair advantages such as full vision on the whole map and resource harvest boosting. To the best of our knowledge, this is the first public work to investigate AI agents that can defeat the built-in AI in the StarCraft II full game.

研究の動機と目的

フルゲームStarCraft II のAIを、観測空間とアクション空間の大規模さに対処しつつ前進させる。
AbyssalReefでレベル1〜10を跨いで、2つの異なるエージェントがチーティングAIを打ち負かすことを実証する。
マクロアクション設計と階層的アクション設計が、事前のゲーム知識を学習に統合する方法を示す。
再利用可能なベースラインとオープンソースコードを提供し、ハイブリッド学習と模倣軌跡生成を可能にする。

提案手法

TStarBot1 は、TechTreeルールと実行をエンコードした165個の事前定義マクロアクションを用いた平坦なマクロアクションベースのアクション空間を使用し、高レベルのRLコントローラがマクロアクション上で学習する。
TStarBot2 は、モジュール化された各モジュールのコントローラと専門家ルールベースの下位層を備えた、マクロ-マイクロ階層的なアクション空間を採用する。
PySC2拡張によりユニットごとのコントロールを公開し、マクロアクションをサポートする完全なZerg TechTreeをエンコードする。
観測は空間的特徴マップと非空間的スカラーから構成され、報酬は疎な終端ゲーム信号となる。
学習にはDueling-DDQNまたはPPOを用い、分散ロールアウト基盤（1920 actors、約3840 CPUs）で学習を加速する。

実験結果

リサーチクエスチョン

RQ1マクロアクションベースのDRLと階層的マクロ-マイクロコントローラは、フルゲームで高難易度のチーティングAIを打ち負かすことができるか。
RQ2マクロアクションの抽象化とTechTree知識は、エンドツーエンド制御と比較して学習効率と性能にどのような影響を与えるか。
RQ3大規模な分散ロールアウトを使用した場合の学習効率とスケーラビリティはどうなるか。
RQ4フルSC2で人間レベルのプレイへ橋渡しするために、どのタイプのゲーム知識（TechTree、ハードルール）をエンコードすることが不可欠か。
RQ5AbyssalReef 1v1 Zerg-vs-Zerg における2つのエージェント設計の性能と学習の複雑さを比較するとどうなるか。

主な発見

TStarBot1とTStarBot2の両方が、AbyssalReefの1v1 Zerg-vs-Zergフルゲームのレベル1〜10の組み込みAIを打ち負かす。
レベル8, 9, 10 は、全マップ視界や資源ボーナスなどの利点を持つチーティングAIである。
TStarBot1 はscratchから学習でき、 strongest bots を単一GPUで1–2日程度の訓練で打ち負かすことができる。
本論文は、広大なアクション空間を管理しTechTree知識を組み込むための165のマクロアクションと階層的アクションフレームワークを導入する。
PySC2拡張によりユニット単位の制御と正式なTechTreeを提供し、より現実的なユニットレベルおよびマクロの意思決定を可能にする。
分散ロールアウト基盤（1920アクター）は、訓練を大幅に高速化し安定性を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。