QUICK REVIEW

[論文レビュー] On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Zhao Mandi, Pieter Abbeel|arXiv (Cornell University)|Jun 7, 2022

Reinforcement Learning in Robotics被引用数 20

ひとこと要約

本論文はメタ強化学習（meta-RL）を、多様なビジョンベースのタスクに対するマルチタスク事前学習と微調整と比較し、微調整がしばしばメタ-RLに匹敵または上回り、よりシンプルかつ安価であることを明らかにしている。

ABSTRACT

Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

研究の動機と目的

多様なタスク分布に跨る vision-based RL において、メタ-RL が単純なマルチタスク事前学習 plus 微調整より利点を提供するかを調査する。
代表的なメタ-RL アルゴリズム（Reptile、PEARL、RL2）の性能を、マルチタスク事前学習と微調整と比較して評価する。
完全に新規のテストタスクを含む、3つのベンチマーク（Procgen、RLBench、Atari）で評価する。
将来のメタ-RL 研究における評価プロトコルとベースライン選択に対する示唆を強調する。

提案手法

三つのメタ-RL 手法（Reptile、PEARL、RL2）を、マルチタスク訓練と微調整のベースラインと比較する。
Procgen には PPO をベースとして、RLBench には C2F-ARM、Atari には RainbowDQN を用い、タスク別のリプレイバッファを使用する。
見たことのないタスクへの微調整による適応と、ベースライン比較のためのゼロからの訓練の両方を評価する。
テスト時適応は、適用可能な場合、テストレベル/タスクごとに 2 million 環境ステップの微調整を含む。
多様なタスク分布と高次元観測を伴う3つのベンチマークにわたって大規模実験を実施する。

実験結果

リサーチクエスチョン

RQ1新規タスクでの微調整を伴うマルチタスク事前学習は、視覚ベースのRLベンチマークにおいてメタ-RL法と同等かそれ以上の性能を示すのか。
RQ2人気のあるメタ-RLアルゴリズム（Reptile、PEARL、RL2）は、多様なタスク分布において、微調整付きのマルチタスク事前学習とどう比較されるか。
RQ3報酬が sparse な設定で高次元の観測がある設定における、メタ-RLと単純な事前学習-微調整の相対的な強みと制限は何か？
RQ4将来のメタ-RL の評価は、より多様なタスク分布へ移行し、強力なマルチタスク事前学習ベースラインを含めるべきか。
RQ5タスクがテスト時に厳密に unseen の場合、Procgen、RLBench、Atari で結果はどのように異なるか？

主な発見

新規タスクでの微調整を伴うマルチタスク事前学習は、視覚ベースの環境においてメタ-RL のベースラインと同等、あるいはそれを上回る。
Procgen、RLBench、 Atari の全体で、真に多様なタスク分布においては、単純なベースラインがしばしばメタ-RL法と互角かそれ以上である。
RLBench の結果は、未知のタスクでの希薄な報酬をマルチタスク事前学習が克服し、ゼロからの訓練よりも優れていることを示す。
RL2 は新しいレベル/ゲームへの適応に一般的に失敗し、困難な設定におけるメタ-RL の適応性が限られているという従来の観察と一致する。
PEARL は、訓練とテストの段階でタスクが視覚的に異なる、分離した訓練–テスト分割において適応に苦労する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。