QUICK REVIEW

[논문 리뷰] Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration, O'Neill, Abby|arXiv (Cornell University)|2023. 10. 13.

Reinforcement Learning in Robotics인용 수 101

한 줄 요약

이 연구는 22 가지 구현체에 걸친 1M+ 경로 로봇 데이터셋과 로봇 간 지식을 전이하는 RT-X 모델을 도입하여 양의 전이를 가능하게 하고 일반화 성능을 향상시킵니다.

ABSTRACT

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.

연구 동기 및 목표

NLP/비전 모델에 상응하는 일반 목적 로봇 정책을 가능하게 하기 위해 X-임베디먼트 데이터의 필요성을 제시.
다수의 구현체와 과제를 포괄하는 표준화되고 대규모의 다중 로봇 데이터셋을 제공합니다.
다중 로봇 데이터로 학습된 RT-1-X 및 RT-2-X 정책의 전이성과 일반화를 평가합니다.
개발 커뮤니티를 활발하게 만들기 위한 오픈 소스 데이터 형식, 벤치마크, 및 사전 학습된 RT-X 체크포인트를 제공합니다.

제안 방법

21개 기관에서 수집된 22개의 로봇 구현을 하나의 통합된 Open X-Embodiment 데이터셋으로 모아 1M+ 경로로 구성합니다.
관찰 및 행동 공간의 대략적인 정렬을 채택하여 공통의 7-자유도 엔드이펙터 동작 표현으로 정렬합니다.
다중 구현체 데이터를 대상으로 두 가지 Transformer 계열 정책 아키텍처(RT-1-X와 RT-2-X)를 평가합니다.
RT-1-X를 로봇 데이터만으로 학습하고, RT-2-X를 로봇 데이터와 웹 규모의 비전-언어 데이터의 공동 미세 조정을 통해 학습합니다.
RT-1-X 및 RT-2-X에 대해 이산 액션 토큰에 대한 교차 엔트로피 손실을 사용합니다.
분포 내(in-distribution) 및 분포 외(out-of-distribution) 설정에서 성능을 평가하고, 히스토리 길이 및 웹 사전 학습에 대한 차등 분석(ablation)을 수행합니다.

Figure 0 : The Open X-Embodiment Dataset. (a) : the dataset consists of 60 individual datasets across $22$ embodiments. (b) : the Franka robot has the largest diversity in visually distinct scenes due to the large number of Franka datasets, (c) : xArm and Google Robot contribute the most number of t

실험 결과

연구 질문

RQ1다중 구현체 데이터로 학습하면 개별 로봇에 긍정적 전이가 나타날까요?
RQ2다중 로봇 노출이 본 적 없는 과제, 물체 및 환경에 대한 일반화를 향상시키나요?
RQ3모델 크기, 히스토리, 그리고 웹 사전학습이 구현 간 XY 전이 및 emergent skills에 어떤 영향을 미치나요?

주요 결과

RT-1-X는 대상 분포 내 태스크에서 Original Method 또는 RT-1보다 평균 성공률이 최대 50% 높게 달성합니다.
RT-2-X(55B)는 평가 구현체에서만 학습된 모델에 비해 일반화 성능이 약 3배 향상됩니다.
다중 로봇 데이터의 공동 학습은 다른 로봇으로 전이되는 emergent skills를 가능하게 합니다(예: Google Robot이 WidowX의 Bridge 데이터를 활용해 개선됩니다).
더 큰 모델 용량(55B RT-2-X)과 웹 기반 사전학습은 데이터가 풍부한 도메인에서 강한 성능과 일반화를 좌우합니다.
짧은 히스토리는 일반화를 해치고, 짧은 이미지 히스토리와 웹 사전학습을 포함하면 결과가 크게 향상됩니다.

Figure 1 : RT-1-X and RT-2-X both take images and a text instruction as input and output discretized end-effector actions. RT-1-X is an architecture designed for robotics, with a FiLM [ 116 ] conditioned EfficientNet [ 117 ] and a Transformer [ 118 ] . RT-2-X builds on a VLM backbone by representing

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.