QUICK REVIEW

[论文解读] Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

Sébastien Forestier, Portelas, Rémy|arXiv (Cornell University)|Aug 7, 2017

Reinforcement Learning in Robotics参考文献 52被引用 174

一句话总结

本文将 Intrinsically Motivated Goal Exploration Processes (IMGEP) 正式化，并引入带有自动课程学习的模块化群体型 IMGEP 架构（AMB），在 2D、Minecraft 和真实 humanoid 机器人实验中验证，以发现多样化技能和 stepping-stone 能力。

ABSTRACT

Intrinsically motivated spontaneous exploration is a key enabler of autonomous developmental learning in human children. It enables the discovery of skill repertoires through autotelic learning, i.e. the self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present an algorithmic approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP architecture relies on several principles: 1) self-generation of goals, generalized as parameterized fitness functions; 2) selection of goals based on intrinsic rewards; 3) exploration with incremental goal-parameterized policy search and exploitation with a batch learning algorithm; 4) systematic reuse of information acquired when targeting a goal for improving towards other goals. We present a particularly efficient form of IMGEP, called AMB, that uses a population-based policy and an object-centered spatio-temporal modularity. We provide several implementations of this architecture and demonstrate their ability to automatically generate a learning curriculum within several experimental setups. One of these experiments includes a real humanoid robot exploring multiple spaces of goals with several hundred continuous dimensions and with distractors. While no particular target goal is provided to these autotelic agents, this curriculum allows the discovery of diverse skills that act as stepping stones for learning more complex skills, e.g. nested tool use.

研究动机与目标

将 Intrinsically Motivated Goal Exploration Processes (IMGEP) 形式化为一个自我生成目标和课程的通用框架。
引入 AMB，一种带有对象中心目标空间和保持跳板的突变的模块化群体型 IMGEP 架构。
通过包括机器人和真实人形机器人在内的多样化实验，展示自动课程学习和高效技能发现。
证明自组织探索能够通过跳板实现多样化技能并具备复杂能力。
将模块化 IMGEP 变体与基线进行比较，以评估样本效率和课程质量。

提出的方法

将目标定义为对完整轨迹的参数化适应度函数，从而实现抽象的目标空间和多样化的目标形式。
提出带有并行探索与开发循环、以及在不同目标之间复用数据的 IMGEP 架构。
实现基于能力进展的内在奖励，以指导目标选择和学习关注。
开发 Modular Population-Based IMGEP (AMB)：对象中心的模块化目标空间、基于群体的策略，以及 SSPMutation，以在突变期间保持跳板。
使用基于学习进展的目标采样（通过目标空间策略）和快速的基于记忆的元策略进行探索；实现用于开发的离线/批量异步训练以进行开发。
提供 Active Model Babbling (AMB) 和 Random Model Babbling (RMB) 等变体，以研究目标空间采样和突变策略的影响。

实验结果

研究问题

RQ1具内在动机的探索是否能够在开放式目标空间中自主生成学习课程？
RQ2模块化、对象中心的目标构建是否提高样本效率和发现技能的多样性？
RQ3保持跳板的突变如何影响工具使用和复杂技能习得？
RQ4在探索效率和技能多样性方面，AMB 与基线 RMB 相比如何？
RQ5自动课程学习在具有高维感知输入的真实机器人系统中的迁移程度如何？

主要发现

基于学习进展的内在奖励能够有效地引导探索朝着具有信息性胜任度提升的目标。
模块化、对象中心的目标空间实现了结构化探索，并促进跨目标的知识重复利用，从而提升技能发现。
Stepping-Stone Preserving Mutations (SSPMutation) 通过使突变与任务结构对齐，帮助在工具使用任务上维持进展，促进在跳板周围的探索。
AMB 变体由学习进展采样驱动，在样本效率和行为多样性方面相较基线有所提升，包括在真实人形机器人实验中的表现。
自主生成的课程能够在没有明确目标或手工设计课程的情况下发现多样化技能和跳板（例如嵌套工具使用）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。