QUICK REVIEW

[논문 리뷰] Unsupervised Meta-Learning for Reinforcement Learning

Abhishek Gupta, Benjamin Eysenbach|arXiv (Cornell University)|2018. 06. 12.

Machine Learning and Data Classification참고 문헌 63인용 수 57

한 줄 요약

이 논문은 상호 정보를 통해 비지도 메타 RL을 통해 환경에 특화된 빠른 학습 절차를 자동으로 생성한 다음, 새로운 보상에 빠르게 적응하도록 MAML로 메타 학습을 수행한다.

ABSTRACT

Meta-learning algorithms use past experience to learn to quickly solve new tasks. In the context of reinforcement learning, meta-learning algorithms acquire reinforcement learning procedures to solve new problems more efficiently by utilizing experience from prior tasks. The performance of meta-learning algorithms depends on the tasks available for meta-training: in the same way that supervised learning generalizes best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We motivate and describe a general recipe for unsupervised meta-reinforcement learning, and present an instantiation of this approach. Our conceptual and theoretical contributions consist of formulating the unsupervised meta-reinforcement learning problem and describing how task proposals based on mutual information can be used to train optimal meta-learners. Our experimental results indicate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design and these procedures exceed the performance of learning from scratch.

연구 동기 및 목표

메타 RL에서 수동 메타 학습 작업 설계의 인간 노력을 줄인다.
고정된 환경 역학 내에서 새로운 보상 함수에 대한 빠른 적응을 가능하게 한다.
상호 정보 기반의 작업 제안이 거의 오라클 수준의 메타 학습자를 낳을 수 있음을 보여준다.
처음부터 학습하는 것과 순수 탐색 후 미세 조정의 이점을 넘어서 보여준다.

제안 방법

보상 없이 CMP (controlled Markov process) 를 정의하고 학습을 빠르게 적응하는 절차 f 를 찾는 문제로 공식화한다.
잠재 z에 의해 유도된 임의 보상 r_z(s,a) 를 매개변수화된 보상으로 제안하고 최악의 경우 후회를 최소화하도록 최적화한다.
DIAYN 기반의 다양하고 도출 가능한 과제 생성을 위한 상호 정보 목표를 사용하고 메타 학습자를 MAML 로 구성하여 실용적인 비지도 메타 학습을 구현한다.
판별자 D_phi 를 학습시켜 I(z;s) 를 최대화하고 task generation 을 위해 r_z(s,a)=log D_phi(z|s) 를 도출한다.
DIAYN 을 사용해 잠재 조건 정책을 얻은 후 제안된 작업들 간 학습 방법을 배우기 위해 MAML 을 적용한다.
비교를 위해 무작위 판별자 기반의 랜덤 작업 basel ine 을 논의한다.

실험 결과

연구 질문

RQ1비지도 작업 제안이 메타 RL 에서 수작업으로 설계된 메타 학습 작업 분포의 필요성을 제거할 수 있는가?
RQ2상호 정보 기반의 작업 제안이 보지 못한 보상에 적응하는 환경 특이적 빠른 학습 절차를 낳는가?
RQ3비지도 메타 RL 이 벤치마크 제어 과제에서 처음부터 학습하는 것 및 수작업으로 설계된 메타 학습 분포와 비교해 어떤 차이를 보이는가?

주요 결과

비지도 메타 RL 은 여러 작업 및 환경에서 처음부터 학습하는 것보다 학습 속도를 크게 향상시킨다.
DIAYN 기반의 작업 제안은 복잡한 작업에서 일반적으로 무작위 작업 제안보다 우수하다.
비지도 메타 학습은 핸드크래프트된 작업 분포에 의존하는 오라클 방법의 성능에 근접할 수 있다.
새로운 보상으로 미세 조정할 때 UML-DIAYN 접근법은 종종 DIAYN 초기화나 VIME 기반 사전 학습을 능가한다.
비지도 상호 작용을 통해 학습된 환경 특이적 편향은 빠른 적응을 향상시킨다는 결과를 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.