QUICK REVIEW

[論文レビュー] Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Alexandre Péré, Sébastien Forestier|arXiv (Cornell University)|Mar 2, 2018

Reinforcement Learning in Robotics参考文献 33被引用数 49

ひとこと要約

本論文は、IMGEP-UGLという二段階アーキテクチャを提案し、内的目標探索の前に教師なし表現学習を通じて目標空間を学習し、学習された表現がエンジニアリングされた目標と同等の探索性能を達成し得ることを示す。

ABSTRACT

Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations.

研究の動機と目的

手作り特徴量を使わずに目標表現を学習することで、自律的な内発的動機づけ探索を推進する。
受動的知覚学習と目標探索を組み合わせた二段階の発達フレームワークを開発する。
教師なし学習で得られた目標空間が、設計された表現と同等の効率的探索を支援するかを評価する。

提案手法

二段階アーキテクチャ: (1) 受動的な生のセンサ観測から潜在埋め込みとその KDE ベース分布を学習する非教師あり目標空間学習（UGL）; (2) 学習された埋め込みを出力/目標空間としておよび確率的な目標方策として用いる内発的動機付け目標探索プロセス（IMGEP）。
UGL段階での表現学習アルゴリズムの多様性を活用（AE、VAE、正規化フローを含むVAE、Isomap、PCA）し、異なる密度推定法（KDE）と比較する。
KLカバレッジで探索の多様性と効率を測定し、学習された目標空間を設計された表現と比較する。

実験結果

リサーチクエスチョン

RQ1設計された目標空間を持つIMGEPと同等の探索ダイナミクスを、IMGEP-UGLは達成できるか？
RQ2埋め込みの次元数は探索性能にどのように影響するか？
RQ3UGL段階の異なる教師なし学習アルゴリズムは、異なる探索効率をもたらすか？
RQ4高次元のロボットタスクにおいて、目標として学習済み潜在空間を用いることは、ランダムまたは手設計の目標と比較して探索を改善するか？
RQ5IMGEP段階で学習済み表現を凍結することの影響は何か？

主な発見

IMGEP-UGLはKL-カバレッジで測定されるように、エンジニアリングされた目標表現とほぼ同等の探索ダイナミクスを達成できる。
多様体を捉えるのに必要な次元を超える埋め込み次元は、テストされたアルゴリズム全体で探索性能を低下させない。
KDEベースの密度推定とともに、AE、VAE、Normalizing Flowsを備えたVAE、Isomap、PCAなどの複数の教師なし手法が、効果的なIMGEP-UGL探索をサポートする。
Radial Flow VAEや一部の代替手法は探索効率が低い可能性があり、埋め込み表現力以外の要因が性能に影響を与えることを示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。