[Paper Review] MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale
MT-Opt presents a scalable, off-policy multi-task reinforcement learning system for real-world robots that learns a repertoire of manipulation tasks by sharing representations and data across tasks, using learned success detectors and task impersonation to efficiently acquire new skills.
General-purpose robotic systems must master a large repertoire of diverse skills to be useful in a range of daily tasks. While reinforcement learning provides a powerful framework for acquiring individual behaviors, the time needed to acquire each skill makes the prospect of a generalist robot trained with RL daunting. In this paper, we study how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously, sharing exploration, experience, and representations across tasks. In this framework new tasks can be continuously instantiated from previously learned tasks improving overall performance and capabilities of the system. To instantiate this system, we develop a scalable and intuitive framework for specifying new tasks through user-provided examples of desired outcomes, devise a multi-robot collective learning system for data collection that simultaneously collects experience for multiple tasks, and develop a scalable and generalizable multi-task deep reinforcement learning method, which we call MT-Opt. We demonstrate how MT-Opt can learn a wide range of skills, including semantic picking (i.e., picking an object from a particular category), placing into various fixtures (e.g., placing a food item onto a plate), covering, aligning, and rearranging. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots, and demonstrate the performance of our system both in terms of its ability to generalize to structurally similar new tasks, and acquire distinct new tasks more quickly by leveraging past experience. We recommend viewing the videos at https://karolhausman.github.io/mt-opt/
Motivation & Objective
- Motivate building general-purpose robotic systems that acquire a broad skill repertoire without learning each task in isolation.
- Propose a scalable framework for continuously instantiating new tasks from existing ones via success detectors and shared representations.
- Develop data collection, task impersonation, and reinforcement learning strategies to amortize data and computation across tasks.
- Demonstrate that shared learning accelerates acquisition of new tasks and enables handling more complex skills.
Proposed method
- Define a multi-task Q-learning policy pi(a|s,Ti) that handles multiple tasks Ti drawn from a categorical distribution.
- Extend QT-Opt to a multi-task setting with a common Q-function Q_theta(s,a,Ti) and a multi-task loss L_multi = E_Ti[ L_i(theta) ].
- Introduce task impersonation (f_I) to reuse episodes across related tasks, including the skill-based impersonation f_I_skill to avoid negative transfer and data dilution.
- Apply data re-balancing at batch level (between tasks and within task successes/failures) to address imbalanced multi-task data.
- Train a visual success detector SD conditioned on task IDs to provide sparse rewards based on final outcome images.
- Use offline RL with a large, distributed dataset collected by 7 robots to train MT-Opt, followed by evaluation on 12 tasks.
Experimental results
Research questions
- RQ1Can MT-Opt learn a wide range of robotic manipulation tasks from a shared policy and data pipeline?
- RQ2Does data and representation sharing across tasks improve learning efficiency and performance compared to single-task and naive multi-task baselines?
- RQ3Does task impersonation with skill-based grouping and re-balancing mitigate negative transfer and data imbalance?
- RQ4Can easier tasks bootstrap learning of harder, related tasks, and can learned skills transfer to new but structurally similar tasks?
Key findings
- MT-Opt achieves up to about 3x average improvement over baselines on 12 real-world tasks.
- MT-Opt reaches 89% success on lift-any and significantly outperforms baselines on seven semantic lifting tasks and four placing/rearrangement tasks.
- A 12-task MT-Opt policy outperforms a 2-task policy on shared tasks, indicating that broader multi-task training enhances performance through representation sharing.
- Skill-based task impersonation plus data re-balancing substantially improves performance across tasks, especially for underrepresented ones, with up to 10x gains in some cases.
- Sharing representations across many tasks improves not only the multi-task policy’s broader capabilities but also performance on specific tasks (e.g., lift-any and place-any).
- Using easier tasks to bootstrap harder tasks is effective when MT-Opt leverages multi-task data and impersonation.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.