QUICK REVIEW

[论文解读] Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents

Joseph Suárez, Yilun Du|arXiv (Cornell University)|Mar 2, 2019

Reinforcement Learning in Robotics参考文献 20被引用 48

一句话总结

Neural MMO 提供了一个持续存在、程序化生成、海量多智能体的环境，在该环境中神经代理通过强化学习学习生存能力，结果表明更大规模的种群促进探索与生态位形成。

ABSTRACT

The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources. We present an artificial intelligence research environment, inspired by the human game genre of MMORPGs (Massively Multiplayer Online Role-Playing Games, a.k.a. MMOs), that aims to simulate this setting in microcosm. As with MMORPGs and the real world alike, our environment is persistent and supports a large and variable number of agents. Our environment is well suited to the study of large-scale multiagent interaction: it requires that agents learn robust combat and navigation policies in the presence of large populations attempting to do the same. Baseline experiments reveal that population size magnifies and incentivizes the development of skillful behaviors and results in agents that outcompete agents trained in smaller populations. We further show that the policies of agents with unshared weights naturally diverge to fill different niches in order to avoid competition.

研究动机与目标

引入一个受 MMORPG 启发的持续存在、可扩展的多智能体环境，用于训练智能代理。
使具有不同物种数量的大规模种群成为可能，以在资源竞争中研究涌现行为。
展示种群规模与物种多样性如何影响探索、专业化与策略学习。

提出的方法

代理在基于瓷砖的程序生成地图中运行，具备觅食食物和水源以及一个策略性战斗系统。
策略是使用策略梯度方法训练的神经网络，在各个种群之间可共享或不共享权重。
观测是环境的局部切片，包括瓷砖类型和代理属性；动作包括移动和攻击选择。
奖励信号是生存时间，按每个时刻的单位奖励的折扣和来计算。
实验使用多个世界实例和服务器合并来评估不同种群设置下的性能。

实验结果

研究问题

RQ1在持续的多智能体环境中，增加并发代理数量如何影响探索与策略学习？
RQ2具有非共享权重的不同种群数量如何影响生态位形成与专业化？
RQ3环境随机化和锦标赛式评估对在竞争中的学习策略有何影响？
RQ4在更大规模种群中训练的策略在合并后与多样化的代理基础进行评估时是否具有泛化能力？

主要发现

更大规模的种群在锦标赛中稳定地提高了代理的生存期。
种群规模放大探索，导致更广泛的地图访问。
更多具有非共享权重的种群促进在整个地图上的生态位形成与专业化。
代理学习依赖关系和策略，这些取决于其他代理的策略与位置。
战斗使代理之间产生强耦合，进一步推动涌现行为和鲁棒策略。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。