Skip to main content
QUICK REVIEW

[论文解读] Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration, O'Neill, Abby|arXiv (Cornell University)|Oct 13, 2023
Reinforcement Learning in Robotics被引用 101
一句话总结

本工作推出 Open X-Embodiment,一个跨越 22 种 embodiment 的 1M+ 轨迹机器人数据集以及能够跨机器人迁移知识、实现正向迁移和改进泛化的 RT-X 模型。

ABSTRACT

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.

研究动机与目标

  • 激发需要 X-embodiment 数据,以使通用机器人策略类似于 NLP/视觉模型。
  • 提供一个标准化的、覆盖众多 embodiment 和任务的大规模多机器人数据集。
  • 评估在多机器人数据上训练的 RT-1-X 和 RT-2-X 策略的迁移与泛化。
  • 提供开源数据格式、基线和预训练的 RT-X 检查点,以推动社区发展。

提出的方法

  • 从 22 种机器人 embodiment、由 21 家机构收集,整合为统一的 Open X-Embodiment Dataset,包含 1M+ 条轨迹。
  • 采用将观测和动作空间粗略对齐至共同的 7-DoF末端执行器动作表示。
  • 在多 embodiment 数据上评估两种基于 Transformer 的策略网络(RT-1-X 和 RT-2-X)。
  • 仅用机器人数据训练 RT-1-X,通过机器人数据与网页规模的视觉-语言数据共同微调训练 RT-2-X。
  • 对离散动作标记使用交叉熵目标函数来训练 RT-1-X 和 RT-2-X。
  • 在分布内和分布外设置下评估性能,并对历史长度和网络预训练进行消融实验。
Figure 0 : The Open X-Embodiment Dataset. (a) : the dataset consists of 60 individual datasets across $22$ embodiments. (b) : the Franka robot has the largest diversity in visually distinct scenes due to the large number of Franka datasets, (c) : xArm and Google Robot contribute the most number of t
Figure 0 : The Open X-Embodiment Dataset. (a) : the dataset consists of 60 individual datasets across $22$ embodiments. (b) : the Franka robot has the largest diversity in visually distinct scenes due to the large number of Franka datasets, (c) : xArm and Google Robot contribute the most number of t

实验结果

研究问题

  • RQ1在多 embodiment 数据上训练是否会带来对单一机器人有益的迁移?
  • RQ2多机器人曝光是否提高对未知任务、物体和环境的泛化?
  • RQ3模型规模、历史长度和网页预训练如何影响跨 embodiment 的 XY 迁移与涌现技能?

主要发现

  • RT-1-X 在有目标的分布内任务中,平均成功率比 Original Method 或 RT-1 高最多 50%。
  • RT-2-X(55B)在泛化方面约比仅在评估 embodiment 上训练的模型提升 3 倍。
  • 在多机器人数据上的共同训练可产生能够迁移到其他机器人的涌现技能(例如,Google Robot 在 WidowX 的 Bridge 数据上得到提升)。
  • 更大规模的模型容量(55B RT-2-X)和基于网络的预训练对于数据密集型领域的强性能和泛化至关重要。
  • 较短的历史会削弱泛化,而包含简短的图像历史和网络预训练会显著提升结果。
Figure 1 : RT-1-X and RT-2-X both take images and a text instruction as input and output discretized end-effector actions. RT-1-X is an architecture designed for robotics, with a FiLM [ 116 ] conditioned EfficientNet [ 117 ] and a Transformer [ 118 ] . RT-2-X builds on a VLM backbone by representing
Figure 1 : RT-1-X and RT-2-X both take images and a text instruction as input and output discretized end-effector actions. RT-1-X is an architecture designed for robotics, with a FiLM [ 116 ] conditioned EfficientNet [ 117 ] and a Transformer [ 118 ] . RT-2-X builds on a VLM backbone by representing

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。