QUICK REVIEW

[論文レビュー] Reinforcement Learning with Augmented Data

Michael Laskin, Kimin Lee|arXiv (Cornell University)|Apr 30, 2020

Reinforcement Learning in Robotics参考文献 54被引用数 246

ひとこと要約

RAD は RL トレーニングにデータ拡張を追加し、データ効率と一般化能力を向上させる。基盤となる RL アルゴリズムを変更せず、ピクセルベースおよび状態ベースの入力に対して適用可能。

ABSTRACT

Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks. Our RAD module and training code are available at https://www.github.com/MishaLaskin/rad.

研究の動機と目的

視覚観察からデータ効率の良い一般化可能な RL を動機づける。
追加の損失なしに、RL における多様なデータ拡張の有効性を調査する。
拡張がピクセルベースおよび状態ベースのベンチマークで性能を向上させることを示す。
RAD を一般的なRL手法と互換性のあるシンプルなプラグアンドプレーモジュールとして確立する。

提案手法

RL トレーニング中の入力観測に確率的なデータ拡張を適用する。
ピクセル入力のフレームスタックには一貫して拡張を適用し、状態入力の時系列にも一貫して適用する。
基礎 RL アルゴリズムに RAD を組み込む（オフポリシーは SAC；オンポリシーは PPO）ことで、コア損失を変更しない。
画像には 10 の拡張を検討（crop、translate、window、grayscale、cutout、cutout-color、flip、rotate、random convolution、color jitter）; プロプリオセプティブ入力にはランダム振幅スケーリングを導入。
DMControl（ピクセル）と OpenAI ProcGen（一般化）および OpenAI Gym の状態ベースタスクを評価する。
オープンソースの RAD コード実装を提供する。

実験結果

リサーチクエスチョン

RQ1ピクセル入力の RL のデータ効率を underlying アルゴリズムを変更せずに改善できるか？
RQ2どの拡張がベンチマーク全体で RL の性能と一般化を最も効果的に改善するか？
RQ3拡張はピクセルベース入力以外の proprioceptive（状態ベース） RL 設定にも利点を広げるか？
RQ4拡張は表現学習と未見環境への一般化にどのように影響するか？

主な発見

RAD はピクセル入力を用いたすべての評価済み DMControl 環境で最先端のデータ効率と最終性能を達成。
RAD はピクセルベースの SAC のデータ効率を、補助損失を用いずにテストされた設定で約4倍向上させる。
RAD は DMControl 環境で多くの状態ベースのベースラインと同等かそれを上回る性能を示し、固有受動入力（proprioceptive）への適用範囲の広さを示唆。
ランダム crops and random translate はピクセル入力にとって最も影響力のある拡張の中にある。
RAD は OpenAI ProcGen ベンチマークでテスト時の一般化を大幅に改善。
新しいランダム振幅スケーリング拡張は状態ベースの RL の性能と入力ノイズへのロバスト性を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。