QUICK REVIEW

[论文解读] Reinforcement Learning with Augmented Data

Michael Laskin, Kimin Lee|arXiv (Cornell University)|Apr 30, 2020

Reinforcement Learning in Robotics参考文献 54被引用 246

一句话总结

RAD 在强化学习训练中加入数据增强，提升像素输入和状态输入的数据效率与泛化能力，且不改变底层 RL 算法。

ABSTRACT

Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks. Our RAD module and training code are available at https://www.github.com/MishaLaskin/rad.

研究动机与目标

激发基于视觉观测的高数据效率与可泛化的强化学习。
在不增加额外损失的情况下，研究多样化数据增强在 RL 中的有效性。
证明数据增强能提升像素基和状态基基准测试的表现。
将 RAD 建立为一个简单的即插即用模块，与常见的 RL 方法兼容。

提出的方法

在 RL 训练过程中对输入观测应用随机数据增强。
在像素输入的帧堆叠上对增强保持一致，在状态输入上按时间保持一致。
将 RAD 插入基础 RL 算法（离策略使用 SAC；策略梯度使用 PPO），但不改变它们的核心损失。
对图像探索十种增强（裁剪、平移、窗口、灰度、Cutout、Cutout-彩色、翻转、旋转、随机卷积、颜色抖动），并对本体感受输入引入随机幅度缩放。
在 DMControl（像素）和 OpenAI ProcGen（泛化）以及 OpenAI Gym 的状态基任务上评估。
提供开源的 RAD 代码实现。

实验结果

研究问题

RQ1在不改变底层算法的前提下，数据增强是否能提高像素输入的 RL 数据效率？
RQ2哪些增强在基准测试上最有效地提升 RL 表现和泛化？
RQ3数据增强是否将收益扩展到超越像素输入的状态基（本体感知）RL 设置？
RQ4数据增强如何影响表征学习以及对未知环境的泛化？

主要发现

RAD 在所有评估的 DMControl 像素输入环境上实现了最先进的数据效率和最终性能。
RAD 在像素输入的 SAC 数据效率上，在测试场景中大约提高了 4 倍，且无需辅助损失。
RAD 在 DMControl 环境中与或超过了许多状态基基线，表明对本体感知输入具有广泛适用性。
随机裁剪和随机平移是对像素输入最具影响力的增强之一。
RAD 在 OpenAI ProcGen 基准测试上显著提升了测试时的泛化能力。
一种新颖的随机幅度缩放增强提升状态基 RL 的性能以及对输入噪声的鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。