QUICK REVIEW

[论文解读] Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

Rui Wang, Joel Lehman|arXiv (Cornell University)|Jan 7, 2019

Reinforcement Learning in Robotics参考文献 69被引用 125

一句话总结

POET 将环境挑战生成与代理优化配对，使跨环境的解决方案转移成为可能，在一次运行中产生多样、日益复杂的学习课程。

ABSTRACT

While the history of machine learning so far largely encompasses a series of problems posed by researchers and algorithms that learn their solutions, an important question is whether the problems themselves can be generated by the algorithm at the same time as they are being solved. Such a process would in effect build its own diverse and expanding curricula, and the solutions to problems at various stages would become stepping stones towards solving even more challenging problems later in the process. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper does just that: it pairs the generation of environmental challenges and the optimization of agents to solve those challenges. It simultaneously explores many different paths through the space of possible problems and solutions and, critically, allows these stepping-stone solutions to transfer between problems if better, catalyzing innovation. The term open-ended signifies the intriguing potential for algorithms like POET to continue to create novel and increasingly complex capabilities without bound. Our results show that POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved by direct optimization alone, or even through a direct-path curriculum-building control algorithm introduced to highlight the critical role of open-endedness in solving ambitious challenges. The ability to transfer solutions from one environment to another proves essential to unlocking the full potential of the system as a whole, demonstrating the unpredictable nature of fortuitous stepping stones. We hope that POET will inspire a new push towards open-ended discovery across many domains, where algorithms like POET can blaze a trail through their interesting possible manifestations and solutions.

研究动机与目标

促成开放式、自我生成的课程设计，其中问题与解决方案共同进化。
开发一种同时增加环境复杂度并优化代理策略的算法。
实现跨环境的求解策略转移以促进创新。
在一个二维双足步行域内展示单次运行中的开放式进展。

提出的方法

维持一个环境–代理对的种群 (EA_List)，起始于一个简单对。
通过变异环境编码来生成新环境，同时确保它们对当前代理既不太难也不太易，并优先考虑新颖性。
在各自的配对环境中使用进化策略(Evolution Strategies, ES)优化每个代理。
定期尝试在环境之间转移代理策略以共享有用技能并加速进展。
如果转移在目标环境中提升性能则对其进行评估并接受。
并行操作以利用多处理器并实现大规模探索。

实验结果

研究问题

RQ1POET 能否在单次运行中产生一个日益复杂且多样化的开放式环境序列？
RQ2环境之间的解答转移对于 POET 的进展和创新是否至关重要？
RQ3POET 是否能实现多样化、可解决的挑战，这些挑战通过直接优化或固定课程都无法解决？
RQ4与在孤立环境上进行优化相比，POET 演化的代理的性能如何？

主要发现

POET 生成了一组在单次运行中被发明并解决的多样化的具有挑战性的环境。
无法仅通过直接在这些环境上进行优化来找到对具有挑战性的环境的解决方案。
对同一挑战的基于课程的渐进扩展未达到 POET 的结果；开放式增长依赖于环境多样性和转移。
环境之间的定期转移对解锁进展和实现偶然的垫脚石很重要。
单次运行在不同地形上产生了广泛的复杂运动策略。
转移机制支持跨授粉，促使超越单一环境的进步。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。