QUICK REVIEW

[논문 리뷰] On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Zhao Mandi, Pieter Abbeel|arXiv (Cornell University)|2022. 06. 07.

Reinforcement Learning in Robotics인용 수 20

한 줄 요약

본 논문은 비전 기반 다양한 태스크에서 meta-RL과 다중 작업 사전학습 후 미세조정을 비교하고, 미세조정이 종종 meta-RL과 동등하거나 이를 상회하면서도 더 단순하고 저렴하다는 것을 보여준다.

ABSTRACT

Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

연구 동기 및 목표

다양한 태스크 분포에 걸친 비전 기반 RL에서 단순한 다중 작업 사전학습과 미세조정에 비해 meta-RL이 이점을 제공하는지 조사한다.
대표적인 meta-RL 알고리즘(Reptile, PEARL, RL2)의 성능을 다중 작업 사전학습과 미세조정과 비교 평가한다.
완전히 새로운 테스트 태스크를 갖춘 세 벤치마크(Procgen, RLBench, Atari)에서 평가한다.
향후 meta-RL 연구에서 평가 프로토콜 및 기준선 선택에 대한 함의를 강조한다.

제안 방법

세 가지 meta-RL 방법(Reptile, PEARL, RL2)을 다중 작업 학습 및 미세조정 기준선과 비교한다.
Procgen에는 PPO를 기본으로, RLBench에는 C2F-ARM, Atari에는 RainbowDQN을 사용하고 태스크별 재생 버퍼를 활용한다.
보지 못한 태스크에서의 미세조정으로 적응을 평가하고 기준선 비교를 위해 처음부터 학습하는 방법도 평가한다.
테스트 시 적응은 각 테스트 레벨/태스크당 2백만 환경 스텝의 미세조정을 포함한다(해당되는 경우).
다양한 태스크 분포와 고차원 관찰을 가진 세 벤치마크에 걸쳐 대규모 실험을 수행한다.

실험 결과

연구 질문

RQ1새로운 태스크에 대한 다중 작업 사전학습과 미세조정이 비전 기반 RL 벤치마크에서 meta-RL 방법과 동등하거나 더 나은 성과를 내는가?
RQ2인기 있는 meta-RL 알고리즘(Reptile, PEARL, RL2)은 다양한 태스크 분포에서 다중 작업 사전학습 및 미세조정과 어떻게 비교되는가?
RQ3희박한 보상과 고차원 관찰이 있는 설정에서 meta-RL과 단순한 사전학습-미세조정의 상대적 강점과 한계는 무엇인가?
RQ4향후 meta-RL 평가가 더 다양한 태스크 분포로 확장되고 강력한 다중 작업 사전학습 기준선을 포함해야 하는가?
RQ5테스트 시 태스크가 엄격히 보이지 않는 경우 Procgen, RLBench, Atari에서 결과가 어떻게 달라지는가?

주요 결과

새로운 태스크에 대한 미세조정을 수반한 다중 작업 사전학습은 비전 기반 환경에서 메타-RL 기반선과 동일하거나 더 나은 성능을 보인다.
Procgen, RLBench, Atari 전반에 걸쳐, 실제로 다양한 태스크 분포에서 단순 기준선이 meta-RL 방법과 경쟁적이거나 우수한 경우가 많다.
RLBench 결과는 다중 작업 사전학습이 보이지 않는 태스크의 희박한 보상을 극복하고 처음부터 학습하는 것을 능가할 수 있음을 보여준다.
RL2는 일반적으로 새로운 레벨/게임에 적응하는 데 실패하는 경향이 있어, 어려운 환경에서의 제한된 meta-RL 적응성에 대한 선행 관찰과 일치한다.
PEARL은 학습-테스트 분리에서 태스크가 학습 단계와 테스트 단계에서 시각적으로 구분될 때 적응에 어려움을 겪는다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.