QUICK REVIEW

[論文レビュー] Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration, O'Neill, Abby|arXiv (Cornell University)|Oct 13, 2023

Reinforcement Learning in Robotics被引用数 101

ひとこと要約

この研究は Open X-Embodiment を紹介する。22 の embodiment にわたる 1M+ の軌道データセットと、ロボット間で知識を転移させ、正の転移と一般化の向上を可能にする RT-X モデルを提供。

ABSTRACT

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.

研究の動機と目的

NLP/vision モデルのような汎用的なロボット方針を可能にする X-embodiment データの必要性を動機づける。
多くの embodiment とタスクを横断する標準化された、大規模なマルチロボットデータセットを提供する。
多ロボットデータで訓練した RT-1-X および RT-2-X 方針を転移と一般化のために評価する。
オープンソースのデータ形式、ベースライン、事前訓練済みの RT-X チェックポイントを提供してコミュニティを活性化する。

提案手法

22 のロボット embodiment から 21 の機関で収集したデータセットを統合した Open X-Embodiment Dataset を構築し、1M+ の軌道で構成する。
観測空間と作用空間の粗い整合を共通の 7-DoF エンドエフェクタ動作表現へ適用する。
マルチ embodiment データ上で 2 つの Transformer ベースの方針アーキテクチャ（RT-1-X と RT-2-X）を評価する。
RT-1-X をロボティクスデータのみで訓練し、RT-2-X はロボティクスデータとウェブ規模のビジョン言語データの共同微調整によって訓練する。
離散的な行動トークンに対するクロスエントロピー目的関数を RT-1-X と RT-2-X に用いる。
分布内および分布外設定での性能を評価し、履歴長とウェブプリトレーニングのアブレーションを行う。

Figure 0 : The Open X-Embodiment Dataset. (a) : the dataset consists of 60 individual datasets across $22$ embodiments. (b) : the Franka robot has the largest diversity in visually distinct scenes due to the large number of Franka datasets, (c) : xArm and Google Robot contribute the most number of t

実験結果

リサーチクエスチョン

RQ1多 embodiment データで学習すると個々のロボットに正の転移が得られるか？
RQ2マルチロボットの露出は未知のタスク、物体、環境への一般化を改善するか？
RQ3モデルサイズ、履歴、ウェブプリトレーニングは XY 転移と embodiment 間の emergent skills にどう影響するか？

主な発見

RT-1-X は対象の分布内タスクで、Original Method や RT-1 より平均成功率が最大で 50% 高い。
RT-2-X (55B) は評価 embodiment のみで訓練したモデルより約 3× の一般化改善を達成。
多ロボットデータでの共同訓練は、他のロボットへ転移する新たなスキルを生み出す（例: Google Robot は WidowX からの Bridge データで改善）。
より大きいモデル容量（55B RT-2-X）とウェブベースのプリトレーニングは、データリッチな領域での強力な性能と一般化に不可欠。
短い履歴は一般化を妨げる一方、短い画像履歴とウェブプリトレーニングを含めることが結果を大幅に向上させる。

Figure 1 : RT-1-X and RT-2-X both take images and a text instruction as input and output discretized end-effector actions. RT-1-X is an architecture designed for robotics, with a FiLM [ 116 ] conditioned EfficientNet [ 117 ] and a Transformer [ 118 ] . RT-2-X builds on a VLM backbone by representing

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。