QUICK REVIEW

[논문 리뷰] Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

Sébastien Forestier, Portelas, Rémy|arXiv (Cornell University)|2017. 08. 07.

Reinforcement Learning in Robotics참고 문헌 52인용 수 174

한 줄 요약

이 논문은 Intrinsically Motivated Goal Exploration Processes (IMGEP)를 형식화하고 자동 커리큘럼 학습이 가능한 모듈식 인구 기반 IMGEP 아키텍처(AMB)를 도입하며, 2D, 마인크래프트, 실제 휴머노이드 로봇 실험을 통해 다양한 기술과 stepping-stone 능력을 발견함을 검증한다.

ABSTRACT

Intrinsically motivated spontaneous exploration is a key enabler of autonomous developmental learning in human children. It enables the discovery of skill repertoires through autotelic learning, i.e. the self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present an algorithmic approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP architecture relies on several principles: 1) self-generation of goals, generalized as parameterized fitness functions; 2) selection of goals based on intrinsic rewards; 3) exploration with incremental goal-parameterized policy search and exploitation with a batch learning algorithm; 4) systematic reuse of information acquired when targeting a goal for improving towards other goals. We present a particularly efficient form of IMGEP, called AMB, that uses a population-based policy and an object-centered spatio-temporal modularity. We provide several implementations of this architecture and demonstrate their ability to automatically generate a learning curriculum within several experimental setups. One of these experiments includes a real humanoid robot exploring multiple spaces of goals with several hundred continuous dimensions and with distractors. While no particular target goal is provided to these autotelic agents, this curriculum allows the discovery of diverse skills that act as stepping stones for learning more complex skills, e.g. nested tool use.

연구 동기 및 목표

자기 생성 목표와 커리큘럼의 일반적 프레임워크로 Intrinsically Motivated Goal Exploration Processes (IMGEP)를 형식화한다.
객체 중심 목표 공간과 stepping-stone 보존 변이를 갖춘 모듈식 인구 기반 IMGEP 아키텍처 AMB를 도입한다.
로봇 공학 및 실제 휴먼 로봇을 포함한 다양한 실험을 통해 자동 커리큘럼 학습과 효율적 기술 발견을 입증한다.
자기 조직적 탐색이 diverse한 기술과 stepping-stone를 통해 복잡한 능력을 가능하게 함을 보여준다.
모듈형 IMGEP 변형을 벤치마크와 비교하여 샘플 효율성과 커리큘럼 품질을 평가한다.]
method:[
목표를 전체 궤적에 대한 파라미터화된 적합도 함수로 정의하여 추상적 목표 공간과 다양한 목적 형태를 가능하게 한다.
병렬 탐색과 활용 루프를 갖춘 IMGEP 아키텍처를 제안하고 목표 간 데이터 재사용을 도입한다.
역량 진전에 기반한 intrinsic 보상을 사용하여 목표 선택과 학습 집중을 안내한다.
객체 중심 모듈식 목표 공간, 인구 기반 정책, 그리고 변이를 통한 stepping-stone 보존을 위한 SSPMutation을 갖춘 Modular Population-Based IMGEP (AMB)를 개발한다.
학습 진행도 기반의 목표 샘플링(목표 공간 정책을 통해)과 탐색용 빠른 메모리 기반 메타-정책을 사용하고, 활용을 위한 비동기적 오프라인/배치 학습을 가능하게 한다.
Active Model Babbling (AMB) 및 Random Model Babbling (RMB)과 같은 변형을 제공하여 목표 공간 샘플링 및 변이 전략의 영향을 연구한다.]
research_questions:[
개방형 목표 공간에서 내재적 동기에 의한 탐색이 학습 커리큘럼을 자율적으로 생성할 수 있는가?
모듈식의 객체 중심 목표 구성이 샘플 효율성과 발견된 기술의 다양성을 향상시키는가?
stepping-stone 보존 변이가 도구 사용 및 복잡한 기술 습득에 미치는 영향은 무엇인가?
AMB가 탐색 효율성 및 기술 다양성 측면에서 벤치마크 RMB와 어떻게 비교되는가?
자동 커리큘럼 학습이 고차원 감각 입력을 가진 실제 로봇 설정으로 얼마나 잘 전이되는가?]
key_findings:[
학습 진행도에 기반한 내재적 보상은 정보성이 높은 능력 향상을 보이는 목표에 대한 탐색 편향을 효과적으로 유도한다.
모듈식의 객체 중심 목표 공간은 구조화된 탐색을 가능하게 하고 목표 간 지식 재사용을 촉진하여 기술 발견을 향상시킨다.
Stepping-Stone Preserving Mutations (SSPMutation)은 작업 구조와 정렬된 변이를 통해 도구 사용 과제의 진전을 유지하는 데 도움을 주어 stepping-stone 주위의 탐색을 돕는다.
학습 진행도 기반 샘플링으로 구동되는 AMB 변형은 벤치마크보다 샘플 효율성과 행동 다양성 측면에서 개선된 성과를 보이며, 실제 휴먼 로봇 실험에서도 이를 확인한다.
자自主적으로 생성된 커리큘럼은 명시적 목표나 수작업 커리큘럼 없이도 다양한 기술과 stepping-stone(중첩 도구 사용 등)를 발견하게 한다.

제안 방법

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.

[논문 리뷰] Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

연구 동기 및 목표

제안 방법

관련 연구

더 나은 연구,지금 바로 시작하세요