QUICK REVIEW

[论文解读] TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

Peng Sun, Xinghai Sun|arXiv (Cornell University)|Sep 19, 2018

Reinforcement Learning in Robotics参考文献 8被引用 54

一句话总结

论文提出了两个完整对局的星际争霸II代理，TStarBot1（基于宏动作的DRL）和TStarBot2（层次宏-微分层，带规则），在1v1 Zerg-vs-Zerg全局对局中击败从1到10级的内置作弊AI。

ABSTRACT

Starcraft II (SC2) is widely considered as the most challenging Real Time Strategy (RTS) game. The underlying challenges include a large observation space, a huge (continuous and infinite) action space, partial observations, simultaneous move for all players, and long horizon delayed rewards for local decisions. To push the frontier of AI research, Deepmind and Blizzard jointly developed the StarCraft II Learning Environment (SC2LE) as a testbench of complex decision making systems. SC2LE provides a few mini games such as MoveToBeacon, CollectMineralShards, and DefeatRoaches, where some AI agents have achieved the performance level of human professional players. However, for full games, the current AI agents are still far from achieving human professional level performance. To bridge this gap, we present two full game AI agents in this paper - the AI agent TStarBot1 is based on deep reinforcement learning over a flat action structure, and the AI agent TStarBot2 is based on hard-coded rules over a hierarchical action structure. Both TStarBot1 and TStarBot2 are able to defeat the built-in AI agents from level 1 to level 10 in a full game (1v1 Zerg-vs-Zerg game on the AbyssalReef map), noting that level 8, level 9, and level 10 are cheating agents with unfair advantages such as full vision on the whole map and resource harvest boosting. To the best of our knowledge, this is the first public work to investigate AI agents that can defeat the built-in AI in the StarCraft II full game.

研究动机与目标

通过应对大型观测与动作空间，推动星际争霸II全局游戏的AI发展。
证明两种不同代理能够在AbyssalReef上从1–10级击败作弊内置AI。
展示宏动作与层次化动作设计如何将先验游戏知识融入学习。
提供可复用的基线与开源代码，以实现混合学习和模仿轨迹生成。

提出的方法

TStarBot1使用一个扁平的、基于宏动作的动作空间，包含165个预定义的宏动作，编码TechTree规则与执行；一个高阶的RL控制器在宏动作上学习。
TStarBot2采用带有模块化、每模块控制器和专家规则基础的下层层次的宏-微动作动作空间。
一个PySC2扩展暴露每单位控制并编码完整的Zerg TechTree以支持宏动作。
观测包括空间特征图和非空间标量；奖励是稀疏的三元端局信号。
训练使用Dueling-DDQN或PPO，并具备分布式 rollout 基础设施（1920个 Actors，约3840个CPU）以加速学习。

实验结果

研究问题

RQ1宏动作基DRL和层次宏-微控制器在全局对局中能否击败高作弊等级的星际争霸II内置AI？
RQ2宏动作抽象与TechTree知识相比端到端控制，如何影响学习效率与性能？
RQ3在使用大规模分布式 rollout 的情况下，完整比赛的SC2代理的训练效率与可扩展性如何？
RQ4为了接近人类水平在全SC2对局中，需编码哪种游戏知识（TechTree、硬性规则）才能弥合差距？
RQ5在AbyssalReef 1v1 Zerg-vs-Zerg上，这两种代理设计在性能与训练复杂性方面有何比较？

主要发现

TStarBot1与TStarBot2在完整对局中从1到10级击败内置AI（AbyssalReef的1v1 Zerg-vs-Zerg）。
Level 8、9、10是具备全地图视野和资源加成等优势的作弊AI。
TStarBot1可以从零开始学习，在单个GPU上1–2天训练即可击败最强的机器人。
论文引入165个宏动作和一个层次化动作框架以管理大动作空间并融入TechTree知识。
一个PySC2扩展提供每单位控制和正式的TechTree，使单位级别与宏决策更加真实。
分布式 rollout 基础设施（1920个 Actors）显著加速训练并提升稳定性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。