QUICK REVIEW

[论文解读] Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Alexandre Péré, Sébastien Forestier|arXiv (Cornell University)|Mar 2, 2018

Reinforcement Learning in Robotics参考文献 33被引用 49

一句话总结

本文提出 IMGEP-UGL，一种两阶段架构，在内在目标探索之前通过无监督表示学习来学习目标空间，并显示所学习的表示可以在具有工程化目标的情况下匹配探索性能。

ABSTRACT

Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations.

研究动机与目标

通过在不使用手工设计特征的情况下学习目标表示，推动自主的内在动机驱动探索。
开发一个将被动感知学习与目标探索相结合的两阶段发展框架。
评估无监督学习的目标空间是否能够实现与工程化表示相当的高效探索。

提出的方法

两阶段架构：(1) 通过被动原始传感器观测进行无监督目标空间学习（UGL），以学习潜在嵌入及其基于 KDE 的分布；(2) 使用学习得到的嵌入作为输出/目标空间以及作为随机目标策略的内在动机目标探索过程（IMGEP）。
在 UGL 阶段使用多种表示学习算法（自编码器 AE、变分自编码器 VAE、带归一化流的 VAE、Isomap、PCA），并与不同密度估计器（KDE）进行比较。
用 KL-coverage 衡量探索多样性和效率，将学习到的目标空间与工程化表示进行比较。

实验结果

研究问题

RQ1IMGEP-UGL 是否能够达到与使用工程化目标空间的 IMGEP 相当高效的探索动态？
RQ2嵌入维度如何影响探索性能？
RQ3UGL 阶段的不同无监督学习算法是否会带来不同的探索效率？
RQ4在高维机器人任务中，使用学习得到的潜在空间作为目标是否比随机目标或手工设计的目标能提高探索？
RQ5在 IMGEP 阶段冻结所学习的表示有什么影响？

主要发现

IMGEP-UGL 的探索动态可以接近使用工程化目标表示时的水平，指标为 KL-coverage。
嵌入维度超出捕获流形所需的范围并不会降低在所测试算法中的探索性能。
多种无监督方法（AE、VAE、带归一化流的 VAE、Isomap、PCA）结合基于 KDE 的密度估计支持有效的 IMGEP-UGL 探索。
径向流 VAE 与某些替代方法可能导致探索效率较低，表明除了嵌入表达能力之外的其他因素会影响性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。