QUICK REVIEW

[论文解读] MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale

Dmitry Kalashnikov, Jacob Varley|arXiv (Cornell University)|Apr 16, 2021

Reinforcement Learning in Robotics参考文献 74被引用 35

一句话总结

MT-Opt 提出一个可扩展的、离线学习的多任务强化学习系统，针对现实世界机器人，通过在任务之间共享表示和数据，学习一系列操作任务，利用学习到的成功检测器和任务模仿来高效获取新技能。

ABSTRACT

General-purpose robotic systems must master a large repertoire of diverse skills to be useful in a range of daily tasks. While reinforcement learning provides a powerful framework for acquiring individual behaviors, the time needed to acquire each skill makes the prospect of a generalist robot trained with RL daunting. In this paper, we study how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously, sharing exploration, experience, and representations across tasks. In this framework new tasks can be continuously instantiated from previously learned tasks improving overall performance and capabilities of the system. To instantiate this system, we develop a scalable and intuitive framework for specifying new tasks through user-provided examples of desired outcomes, devise a multi-robot collective learning system for data collection that simultaneously collects experience for multiple tasks, and develop a scalable and generalizable multi-task deep reinforcement learning method, which we call MT-Opt. We demonstrate how MT-Opt can learn a wide range of skills, including semantic picking (i.e., picking an object from a particular category), placing into various fixtures (e.g., placing a food item onto a plate), covering, aligning, and rearranging. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots, and demonstrate the performance of our system both in terms of its ability to generalize to structurally similar new tasks, and acquire distinct new tasks more quickly by leveraging past experience. We recommend viewing the videos at https://karolhausman.github.io/mt-opt/

研究动机与目标

推动构建通用型机器人系统，使其在不对每个任务单独学习的情况下，获取广泛的技能库。
提出一个可扩展的框架，通过成功检测器和共享表示，从现有任务连续实例化新任务。
开发数据收集、任务模仿以及强化学习策略，以跨任务摊销数据和计算成本。
证明共享学习能加速新任务的获取，并使处理更复杂技能成为可能。

提出的方法

定义一个多任务 Q-learning 策略 pi(a|s,Ti)，处理从分类分布中抽取的多个任务 Ti。
将 QT-Opt 扩展到多任务设置，使用公用 Q 函数 Q_theta(s,a,Ti) 和多任务损失 L_multi = E_Ti[ L_i(theta) ]。
引入任务仿冒（f_I）以在相关任务之间复用实验样本，包括基于技能的仿冒 f_I_skill，以避免负迁移和数据稀释。
在批次层面应用数据再平衡（任务之间及任务内的成功/失败之间），以解决多任务数据不平衡。
训练一个基于任务ID条件化的视觉成功检测器 SD，以基于最终结果图像提供稀疏奖励。
使用由 7 台机器人收集的庞大分布式数据集进行离线 RL 训练 MT-Opt，随后在 12 个任务上进行评估。

实验结果

研究问题

RQ1MT-Opt 是否能从共享策略和数据管线中学习广泛的机器人操控任务？
RQ2跨任务的数据和表示共享是否优于单任务和朴素多任务基线在学习效率与性能方面的提升？
RQ3结合技能分组和再平衡的任务仿冒是否能缓解负迁移和数据不平衡？
RQ4更简单的任务是否能为更难、相关的任务提供引导，并且学习到的技能能否迁移到新的但结构上相似的任务？

主要发现

MT-Opt 在 12 个现实世界任务上，实现了对基线的平均提升约 3 倍。
MT-Opt 在 lift-any 上达到 89% 的成功率，并在七个语义提升任务和四个放置/重新排列任务上显著优于基线。
12 任务的 MT-Opt 策略在共享任务上优于 2 任务策略，表明通过表示共享的更广泛多任务训练可以提升性能。
基于技能的任务仿冒加上数据再平衡在各任务上显著提升性能，尤其是对代表性不足的任务，在某些情况下提升达到最多 10 倍。
跨大量任务共享表示不仅提升多任务策略的更广泛能力，也提升对特定任务的性能（例如 lift-any 与 place-any）。
当 MT-Opt 利用多任务数据和仿冒时，使用更简单的任务来引导更困难的任务是有效的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。