QUICK REVIEW

[论文解读] Unity: A General Platform for Intelligent Agents

Arthur Juliani, Vincent-Pierre Berges|arXiv (Cornell University)|Sep 7, 2018

Reinforcement Learning in Robotics参考文献 78被引用 558

一句话总结

论文主张现代游戏引擎（以 Unity 与 Unity ML-Agents Toolkit 为例）可以作为创建丰富、可配置的 AI 学习环境的一般平台，并对其如何支持广泛的强化学习研究进行了综述。

ABSTRACT

Recent advances in artificial intelligence have been driven by the presence of increasingly realistic and complex simulated environments. However, many of the existing environments provide either unrealistic visuals, inaccurate physics, low task complexity, restricted agent perspective, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, making the simulated environment a black-box from the perspective of the learning system. In this work, we propose a novel taxonomy of existing simulation platforms and discuss the highest level class of general platforms which enable the development of learning environments that are rich in visual, physical, task, and social complexity. We argue that modern game engines are uniquely suited to act as general platforms and as a case study examine the Unity engine and open source Unity ML-Agents Toolkit. We then survey the research enabled by Unity and the Unity ML-Agents Toolkit, discussing the kinds of research a flexible, interactive and easily configurable general platform can facilitate.

研究动机与目标

通过潜在环境复杂性（感官、物理、任务逻辑、社会性）的四维分类来提出仿真平台的分类法。
评估 Unity 与 Unity ML-Agents Toolkit 作为通用平台，能够实现丰富、可配置的人工智能研究环境。
调研现有由 Unity/ML-Agents 支持的研究，识别面向通用平台的进展瓶颈与机会。

提出的方法

引入一个四轴仿真器分类法：感官、物理、任务逻辑、社会复杂性。
分析 Unity 引擎属性及其如何实现环境轴（图形、物理、脚本、多代理支持）。
呈现 Unity ML-Agents Toolkit 架构（Agents、Academy、Sensors）和用于环境交互的 Python API。
描述 ML-Agents SDK 及其组件（策略、行为、奖励）及其如何与 Unity 场景集成。
提供性能基准并讨论在 Unity 环境中进行课程学习、领域随机化和可扩展性（ICM、LSTM）。

实验结果

研究问题

RQ1游戏引擎如何作为具备丰富感官、物理、任务及社会复杂性的通用 AI 研究平台？
RQ2Unity 提供哪些能力来创建灵活、可配置的强化学习学习环境？
RQ3Unity ML-Agents 的架构与工作流在 Unity 环境中部署和训练代理的流程是怎样的？
RQ4使用像 Unity 这样的通用平台进行强化学习基准测试与实验时，存在的研究潜力与局限性是什么？

主要发现

环境	观测类型	# 代理数	均值（ms）	标准差（ms）
Basic	Vector(1)	1	0.803	0.005
3D Ball	Vector(8)	12	5.05	0.039
GridWorld	Visual(84x84x3)	1	2.04	0.038
Visual Food Collector	Visual(84x84x3)	4	9.23	0.556

Unity 能提供高保真视觉效果和通过 PhysX/Havok 的灵活物理仿真（以及可选的第三方引擎）。
ML-Agents Toolkit 提供可重用的 SDK，包含 Agents、Academy、传感器，以及用于使用强化学习和模仿学习方法进行训练的 Python API。
仿真可以比实时更快运行并且可分布，且可选择渲染以提升速度或省略渲染以加速。
通过在 Academy 中运行时环境参数变化和重新采样，可以实现课程学习与领域随机化。
一个兼容 gym 的 Python 接口便于与现有的强化学习工作流和基准进行整合。
该平台支持多代理协作与竞争、自我对弈，并可通过 ICM、LSTM 等模块增强学习信号。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。