QUICK REVIEW

[논문 리뷰] Scaling Single Human Demonstrations for Imitation Learning using Generative Foundational Models

Nick Heppert, Minh Quang Nguyen|arXiv (Cornell University)|2026. 02. 13.

Robot Manipulation and Learning인용 수 0

한 줄 요약

Real2Gen 소개: 단일 인간 시연을 3D 생성 모델을 활용한 확장 가능한 시뮬레이션 파이프라인으로 변환하는 Reality-to-Simulation 방법으로, DITTO 베이스라인 대비 성공률이 크게 높아 제로샷 실제전이 가능하게 한다.

ABSTRACT

Imitation learning is a popular paradigm to teach robots new tasks, but collecting robot demonstrations through teleoperation or kinesthetic teaching is tedious and time-consuming. In contrast, directly demonstrating a task using our human embodiment is much easier and data is available in abundance, yet transfer to the robot can be non-trivial. In this work, we propose Real2Gen to train a manipulation policy from a single human demonstration. Real2Gen extracts required information from the demonstration and transfers it to a simulation environment, where a programmable expert agent can demonstrate the task arbitrarily many times, generating an unlimited amount of data to train a flow matching policy. We evaluate Real2Gen on human demonstrations from three different real-world tasks and compare it to a recent baseline. Real2Gen shows an average increase in the success rate of 26.6% and better generalization of the trained policy due to the abundance and diversity of training data. We further deploy our purely simulation-trained policy zero-shot in the real world. We make the data, code, and trained models publicly available at real2gen.cs.uni-freiburg.de.

연구 동기 및 목표

모방 학습에서 데이터 수집 노력을 줄이기 위해 풍부한 인간 시연을 활용하는 것을 동기화한다.
하나의 인간 시연을 확장 가능한 시뮬레이션 데이터셋으로 변환하는 Real2Gen 파이프라인을 제안한다.
생성된 로봇 시연을 이용한 정책 학습을 흐름 매칭 방식으로 가능하게 한다.

제안 방법

DITTO 또는 유사한 방법을 사용하여 단일 인간 데모에서 물체 중심 정보를 추출한다.
생성 모델(Point-E)로 3D 물체 자산을 생성하고 Zero-Shot-Pose를 사용하여 스케일과 포즈를 회복하면서 인간 데모에 정렬한다.
생성된 메시로 스크립트된 전문가 에이전트를 통해 대규모 로봇 시연 데이터셋을 생산하기 위해 SAPIEN으로 시뮬레이션 환경을 만든다.
현재 관측에 조건부된 흐름 매칭 정책(PointFlowMatch)을 학습시켜 전문가의 행동을 모방한다.
미지의 실제 물체 인스턴스에 대한 전이와 학습된 정책의 제로샷 실제 배치를 평가한다.

실험 결과

연구 질문

RQ1단일 인간 시연을 로봇 학습에 적합한 확장 가능한 시뮬레이션 데이터셋으로 변환할 수 있는가?
RQ2Real2Gen은 성공률과 작업 간 일반화 측면에서 이전 방법(DITTO 등)과 어떻게 비교되는가?
RQ3생성된 메시의 수와 시연의 양이 정책 성능에 미치는 영향은 무엇인가?
RQ4생성된 정책이 실제 로봇으로 제로샷 전이되는가?

주요 결과

Real2Gen은 Sponge on Tray, Coke on Tray, Paperroll upright의 세 가지 작업에서 DITTO 베이스라인보다 더 높은 평균 성공률을 달성한다.
Real2Gen의 평균 종합 성공률: 37.5% vs. 10.9% (DITTO) 및 8.2% (DITTO with ZSP).
생성 자산을 사용하면 작업 관련 메시 옵션이 더 많아지고 학습 데이터의 다양성이 향상된다.
절 차된 메시/시연 수를 넘어서면 수익 감소가 나타나는 소거효과가 나타나며, 데이터 양과 품질 사이의 균형이 필요함을 시사한다.
Real2Gen은 시뮬레이션에서 학습된 정책을 물리 로봇 시스템으로 제로샷 전이할 수 있도록 실질적인 성공을 보인다.
저자 프로젝트 페이지에서 데이터, 코드, 학습 모델의 공개 릴리스가 예정되어 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.