QUICK REVIEW

[논문 리뷰] Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Alexandre Péré, Sébastien Forestier|arXiv (Cornell University)|2018. 03. 02.

Reinforcement Learning in Robotics참고 문헌 33인용 수 49

한 줄 요약

이 논문은 선형 없이 목표 공간을 학습하는 두 단계 아키텍처 IMGEP-UGL를 소개하고, 학습된 표현이 엔지니어링된 목표와 같은 탐색 성능을 보일 수 있음을 보여준다.

ABSTRACT

Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations.

연구 동기 및 목표

핸드크래프트 피처 없이 목표 표현을 학습하여 자율적으로 내재적 동기를 가진 탐험을 촉진한다.
수동적 지각 학습과 목표 탐험을 결합한 2단계 발달 프레임워크를 개발한다.
무지도 학습으로 학습된 목표 공간이 엔지니어링된 표현에 필적하는 효율적인 탐험을 지원하는지 평가한다.

제안 방법

Two-stage architecture: (1) Unsupervised Goal space Learning (UGL) from passive raw sensor observations to learn a latent embedding and its KDE-based distribution; (2) Intrinsically Motivated Goal Exploration Process (IMGEP) using the learned embedding as both outcome/goal space and as a stochastic goal policy.
UGL 단계에서 다양한 표현 학습 알고리즘(AEs, VAEs, VAE with Normalizing Flows, Isomap, PCA)을 사용하고 서로 다른 밀도 추정기(KDE)와 비교한다.
Measure exploration diversity and efficiency with KL-coverage, comparing learned goal spaces against engineered representations.

실험 결과

연구 질문

RQ1IMGEP-UGL이 엔지니어링된 목표 공간을 가진 IMGEP와 같은 탐색 역학을 달성할 수 있는가?
RQ2임베딩 차원이 탐색 성능에 어떤 영향을 미치는가?
RQ3UGL 단계에서 서로 다른 비지도 학습 알고리즘이 서로 다른 탐색 효율을 내는가?
RQ4목표로서 학습된 잠재 공간을 사용하는 것이 무작위 또는 핸드 디자이너 목표에 비해 고차원 로봗 작업에서 탐색을 개선하는가?
RQ5IMGEP 단계에서 학습된 표현을 고정하는 것이 미치는 영향은 무엇인가?

주요 결과

IMGEP-UGL은 KL-커버리지로 측정될 때 엔지니어링된 목표 표현으로 달성한 탐색 역학에 근접한 탐색 역동성을 달성할 수 있다.
임베딩 차원이 매니폴드를 포착하는 데 필요한 것 이상으로 확장되더라도 테스트된 알고리즘 간에 탐색 성능이 악화되지 않는다.
AE, VAE, VAE with Normalizing Flows, Isomap, PCA와 KDE 기반 밀도 추정이 결합된 다중 비지도 방법은 효과적인 IMGEP-UGL 탐색을 지원한다.
Radial Flow VAEs 및 일부 대안은 덜 효율적인 탐색을 보일 수 있어 임베딩 표현력 너머의 요인이 성능에 영향을 준다는 것을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.