QUICK REVIEW

[논문 리뷰] Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

Rui Wang, Joel Lehman|arXiv (Cornell University)|2019. 01. 07.

Reinforcement Learning in Robotics참고 문헌 69인용 수 125

한 줄 요약

POET는 환경 도전 생성과 에이전트 최적화를 매칭시켜 한 실행에서 환경 간 솔루션 이전을 가능하게 하여 다양하고 점점 더 복잡한 학습 커리큘럼을 생성한다.

ABSTRACT

While the history of machine learning so far largely encompasses a series of problems posed by researchers and algorithms that learn their solutions, an important question is whether the problems themselves can be generated by the algorithm at the same time as they are being solved. Such a process would in effect build its own diverse and expanding curricula, and the solutions to problems at various stages would become stepping stones towards solving even more challenging problems later in the process. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper does just that: it pairs the generation of environmental challenges and the optimization of agents to solve those challenges. It simultaneously explores many different paths through the space of possible problems and solutions and, critically, allows these stepping-stone solutions to transfer between problems if better, catalyzing innovation. The term open-ended signifies the intriguing potential for algorithms like POET to continue to create novel and increasingly complex capabilities without bound. Our results show that POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved by direct optimization alone, or even through a direct-path curriculum-building control algorithm introduced to highlight the critical role of open-endedness in solving ambitious challenges. The ability to transfer solutions from one environment to another proves essential to unlocking the full potential of the system as a whole, demonstrating the unpredictable nature of fortuitous stepping stones. We hope that POET will inspire a new push towards open-ended discovery across many domains, where algorithms like POET can blaze a trail through their interesting possible manifestations and solutions.

연구 동기 및 목표

문제와 솔루션이 공진화하는 오픈 엔드형 자가 생성 커리큘라를 동기 부여한다.
환경의 복잡성을 동시에 확장하고 에이전트 정책을 최적화하는 알고리즘을 개발한다.
환경 간 해결 전략의 전송을 가능하게 하여 혁신을 촉발한다.
2차원 이족보행 도메인에서 단일 실행으로 오픈 엔드형 진보를 보여준다.

제안 방법

간단한 쌍으로 시작하는 환경–에이전트(EA_List) 풀을 유지한다.
현재 에이전트에게 너무 어렵지도 너무 쉽지도 않도록 환경 인코딩을 변이시켜 새로운 환경을 생성하고, 참신성을 우선시한다.
Evolution Strategies(ES)를 사용하여 각 에이전트를 페어된 환경 내에서 최적화한다.
진전 속도를 높이기 위해 주기적으로 환경 간 에이전트 정책 전송을 시도한다.
전송 시도는 대상 환경에서 성능을 향상시키면 수용된다.
여러 프로세서를 활용하고 대규모 탐색을 가능하게 하기 위해 병렬로 작동한다.

실험 결과

연구 질문

RQ1POET가 단일 실행에서 점점 더 복잡하고 다양해지는 환경의 열린 시퀀스를 생성할 수 있는가?
RQ2환경 간 솔루션 전송이 POET의 진전과 혁신에 필수적인가?
RQ3POET가 직접 최적화나 고정 커리큘럼으로는 해결할 수 없었던 다양하고 해결 가능한 도전을 달성하는가?
RQ4POET에서 진화된 에이전트의 성능은 고립된 환경에서의 최적화와 비교하여 어떠한가?

주요 결과

POET는 하나의 실행 안에서 발명되고 해결된 다양한 도전적 환경을 생성한다.
도전적인 환경에 대한 해법은 그 환경들에서만 직접 최적화해서는 찾을 수 없었다.
동일한 도전에 대한 커리큘럼 기반의 점진적 확장은 POET의 결과에 도달하지 못했다; 열린 엔드형 성장은 환경 다양성과 전이(전송)에 의존한다.
환경 간의 주기적 전이는 진전을 열고 뜻밖의 디딤돌을 가능하게 하는 데 중요하다.
단일 실행으로 다양한 지형에서 광범위한 진보된 보행 전략을 얻을 수 있다.
전이 메커니즘은 개별 환경을 넘어 진보를 가속화하는 교차 수분 작용을 뒷받침한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.