QUICK REVIEW

[论文解读] ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games

Yuandong Tian, Qucheng Gong|arXiv (Cornell University)|Jul 4, 2017

Reinforcement Learning in Robotics参考文献 15被引用 58

一句话总结

ELF 提供一个轻量、灵活的 RTS 研究平台，包含三个环境（Mini-RTS、Capture the Flag、Tower Defense），实现端到端 RL 训练的高吞吐量，并且是开源的。

ABSTRACT

In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a miniature version of StarCraft, captures key game dynamics and runs at 40K frame-per-second (FPS) per core on a Macbook Pro notebook. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end-to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environments like Arcade Learning Environment. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU and Batch Normalization coupled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than $70\%$ of the time in the full game of Mini-RTS. Strong performance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open-sourced at https://github.com/facebookresearch/ELF.

研究动机与目标

创建一个面向研究的、广泛、轻量、灵活的实时策略（RTS）RL 研究平台。
提供一个具有多个环境和高仿真速度的 RTS 引擎（例如 Mini-RTS 每核心 40K FPS）。
支持灵活的环境-代理通信拓扑和与现有 C/C++ 游戏环境的集成。
实现端到端训练 RL 代理对内置 AI，并研究训练动态，包括课程学习和分层指令结构。
提供一个开源框架，促进 RTS 及相关领域的 RL 研究。

提出的方法

提出一个生产者-消费者架构，基于 C++ 的游戏仿真和 Python RL 后端，用于批量化经验处理。
支持灵活的环境-模型拓扑（一对一、多对一、一对多）以及多模型批处理以提升训练效率。
提供统一接口来托管各种游戏（如 RTS、通过适配器接入的 Atari），并支持原始像素输入和内部游戏数据。
在 Python 基础的 RL 后端中整合基线 RL 方法（例如 A3C、策略梯度、Q 学习、TRPO）。
研究课程训练和更长的时间步，以及带 Leaky ReLU 和 Batch Normalization 的网络结构以提升性能。
展示 Mini-RTS 的端到端训练，并在多款游戏中对比内置 AI 的表现。
探索带 Monte-Carlo Tree Search（MCTS）的规划，并与 RL 基线进行比较。

实验结果

研究问题

RQ1ELF 训练的端到端 RL 代理在部分信息下能否击败内置基于规则的 AI 进行完整 RTS 游戏？
RQ2架构选择（Leaky ReLU、BatchNorm）和训练设置（长时间步、课程学习）如何影响 RTS 任务的性能？
RQ3不同帧跳、历史长度和拓扑配置对学习效率和泛化有何影响？
RQ4ELF 相较于现有 RTS 环境在吞吐量和灵活性方面对快速 RL 实验的表现如何？
RQ5计划方法（MCTS）在完整信息下是否能接近 RL 的性能？

主要发现

ELF 使 RTS 代理能够端到端训练，在 Mini-RTS 中在特定课程和网络选择下超过内置 AI 的胜率超过 70%。
Mini-RTS 在每 CPU 核 40K FPS 的速率运行，使在单机、适中硬件条件下一天内训练出完整版游戏机器人成为可能。
使用 Leaky ReLU 和 Batch Normalization 的网络，结合长时间窗训练与渐进式课程，比基线提高胜率。
当对抗多样化对手或跨对手类型微调时，课程训练显著提升性能与鲁棒性。
在完全信息下，MCTS 可以达到有竞争力的胜率，尽管速度慢于经过训练的 RL AI，显示了规划与学习方法的互补性。
ELF 支持灵活的多拓扑 RL 实验，在三个 RTS 环境（Mini-RTS、Capture the Flag、Tower Defense）中展示出强劲的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。