QUICK REVIEW

[论文解读] StarCraft II: A New Challenge for Reinforcement Learning

Oriol Vinyals, Timo Ewalds|arXiv (Cornell University)|Aug 16, 2017

Digital Games and Media参考文献 11被引用 683

一句话总结

本文提出了 SC2LE（星际争霸 II 学习环境），一个基于 RTS 的强化学习基准，包含全局游戏任务和小型游戏任务，概述其观测/行动/奖励接口并给出基线强化学习结果。它认为 SC2LE 是一个具有挑战性、多智能体、部分可观测域，能推动深度强化学习架构的发展。

ABSTRACT

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

研究动机与目标

将 SC2LE 作为基于星际争霸 II 的强化学习环境引入。
描述该领域的挑战：多智能体交互、信息不完全、庞大的动作/状态空间，以及长期信用分配。
提供开源接口（PySC2）和用于强化学习研究的数据集（人类回放）。
提供基线结果以校准难度并指导未来的强化学习算法开发。

提出的方法

将观测定义为低分辨率特征层和辅助的非空间数据。
设计一个镜像人类用户界面的行动空间，约有 300 个行动-函数标识符和 13 种参数类型。
使用带有 n 步回报和熵正则化的异步优势行动者评论家（A3C）作为基线学习算法。
评估多种神经网络结构（类似 Atari-net、FullyConv，以及带 LSTM 的 FullyConv）以将观测映射到行动策略。
提供带有定制奖励的小型游戏任务，以隔离特定游戏元素。

实验结果

研究问题

RQ1深度强化学习代理是否能使用 SC2LE 界面为完整的星际争霸 II 对局学习到有意义的策略？
RQ2标准强化学习基线（A3C）是否能扩展到星际争霸 II 的庞大动作/状态空间？
RQ3包括具备空间感知能力的网络在 SC2LE 观测上的表现如何？
RQ4小型游戏在隔离和解决星际争霸 II 内的子任务方面的价值何在？
RQ5在全局对局、小型游戏或随机基线上训练时，代理的表现有何差异？

主要发现

基线 RL 代理在梯子地图上对抗简单 AI 时难以赢得完整对局。
使用 Blizzard 分数奖励训练的代理收敛到简单的采矿为主或不前进的策略。
一个完全卷积、具备记忆能力的架构表现更鲁棒，但在完整对局上仍未达到获胜性能。
小型游戏使代理达到新手级别的水平，但在测试的基线下，完整对局的进展仍然有限。
SC2LE 设置为推进深度强化学习架构、感知、记忆和在复杂环境中决策制定的一个具有挑战性的基准。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。