QUICK REVIEW

[論文レビュー] Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

Rui Wang, Joel Lehman|arXiv (Cornell University)|Jan 7, 2019

Reinforcement Learning in Robotics参考文献 69被引用数 125

ひとこと要約

POETは環境チャレンジの生成とエージェントの最適化をペアリングし、環境間での解の転送を可能にすることで、単一の実行内に多様でますます複雑な学習カリキュラムを生み出す。

ABSTRACT

While the history of machine learning so far largely encompasses a series of problems posed by researchers and algorithms that learn their solutions, an important question is whether the problems themselves can be generated by the algorithm at the same time as they are being solved. Such a process would in effect build its own diverse and expanding curricula, and the solutions to problems at various stages would become stepping stones towards solving even more challenging problems later in the process. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper does just that: it pairs the generation of environmental challenges and the optimization of agents to solve those challenges. It simultaneously explores many different paths through the space of possible problems and solutions and, critically, allows these stepping-stone solutions to transfer between problems if better, catalyzing innovation. The term open-ended signifies the intriguing potential for algorithms like POET to continue to create novel and increasingly complex capabilities without bound. Our results show that POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved by direct optimization alone, or even through a direct-path curriculum-building control algorithm introduced to highlight the critical role of open-endedness in solving ambitious challenges. The ability to transfer solutions from one environment to another proves essential to unlocking the full potential of the system as a whole, demonstrating the unpredictable nature of fortuitous stepping stones. We hope that POET will inspire a new push towards open-ended discovery across many domains, where algorithms like POET can blaze a trail through their interesting possible manifestations and solutions.

研究の動機と目的

問題と解決が共進化するオープンエンドで自己生成されるカリキュラムを動機づける。
環境の複雑さを同時に高めつつエージェント方針を最適化するアルゴリズムを開発する。
解法戦略の環境間転送を可能にしてイノベーションを促進する。
2-D bipedal-walker domain内で1回の実行においてオープンエンドな進歩を実証する。

提案手法

単純なペアから始まる環境–エージェント対 (EA_List) の集団を維持する。
現在のエージェントにとって難しすぎず容易すぎないよう環境エンコードを変異させて新しい環境を生成し、新規性を優先する。
各エージェントをペアリングされた環境内で進化戦略（ES）を用いて最適化する。
有用なスキルを共有し進歩を加速させるため、定期的に環境間でエージェント方針の転送を試みる。
転送試行は評価され、ターゲット環境での性能が向上すれば受理される。
複数のプロセッサを活用して並列に動作し、大規模な探索を可能にする。

実験結果

リサーチクエスチョン

RQ1POETは単一の実行内で、ますます複雑で多様な環境のオープンエンドな連続を生み出せるか？
RQ2POETにおける進歩とイノベーションには環境間の解の転送が不可欠か？
RQ3POETは直接的最適化や固定カリキュラムでは解決できない多様で解決可能な課題を達成するか？
RQ4POETで進化したエージェントの性能を孤立した環境での最適化と比較するとどうか？

主な発見

POETは1回の実行内で発明され解決される多様な挑戦的環境を生成する。
挑戦的な環境の解は、それらの環境自体を直接最適化するだけでは見つかんなかった。
同じ課題へのカリキュラムベースの段階的スケーリングはPOETの結果に到達しなかった。オープンエンドの成長は環境の多様性と転送に依存する。
環境間の定期的な転送は進歩を解き放ち、偶然の踏み石を可能にするために重要である。
単一の実行で、さまざまな地形にわたる高度な移動戦略の広範な範囲を生み出す。
転送メカニズムは個別環境を超える進歩を加速させる相互受粉を支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。