QUICK REVIEW

[논문 리뷰] RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids

Xichen Yuan, Zhe Li|arXiv (Cornell University)|2026. 03. 18.

Human Motion and Animation인용 수 0

한 줄 요약

RoboForge는 텍스트-에서-동작 생성과 물리 기반 최적화를 연결하는 양방향, 잠재적 추론 기반 프레임워크를 제안하여 물리적으로 타당하고 리타겟 없이 휴머노이드 보행을 생성하고 시뮬레이션과 실제 하드웨어에서 생성 품질과 추적 안정성을 모두 향상시킵니다.

ABSTRACT

While generative models have become effective at producing human-like motions from text, transferring these motions to humanoid robots for physical execution remains challenging. Existing pipelines are often limited by retargeting, where kinematic quality is undermined by physical infeasibility, contact-transition errors, and the high cost of real-world dynamical data. We present a unified latent-driven framework that bridges natural language and whole-body humanoid locomotion through a retarget-free, physics-optimized pipeline. Rather than treating generation and control as separate stages, our key insight is to couple them bidirectionally under physical constraints.We introduce a Physical Plausibility Optimization (PP-Opt) module as the coupling interface. In the forward direction, PP-Opt refines a teacher-student distillation policy with a plausibility-centric reward to suppress artifacts such as floating, skating, and penetration. In the backward direction, it converts reward-optimized simulation rollouts into high-quality explicit motion data, which is used to fine-tune the motion generator toward a more physically plausible latent distribution. This bidirectional design forms a self-improving cycle: the generator learns a physically grounded latent space, while the controller learns to execute latent-conditioned behaviors with dynamical integrity.Extensive experiments on the Unitree G1 humanoid show that our bidirectional optimization improves tracking accuracy and success rates. Across IsaacLab and MuJoCo, the implicit latent-driven pipeline consistently outperforms conventional explicit retargeting baselines in both precision and stability. By coupling diffusion-based motion generation with physical plausibility optimization, our framework provides a practical path toward deployable text-guided humanoid intelligence.

연구 동기 및 목표

휴머노이드 로봇에서 텍스트-에서-동작 생성과 물리적 실행 간의 격차를 해소한다.
잠재적이고 리타겟팅 없는 제어 인터페이스를 사용하여 명시적 리타겟 실패를 제거한다.
물리적 제약 하에서 동작 생성과 추적을 함께 최적화하는 PP-Opt 모듈을 도입한다.
시뮬레이션과 Unitree G1 하드웨어에서의 안정성 및 물리적 타당성을 향상시켰음을 시연한다.
반복적인 PP-Opt 정제가 생성 품질과 실행 가능성에 누적 이득을 가져옴을 보인다.

제안 방법

텍스트 프롬프트에 조건화된 잠재 공간 확산 기반 모션 생성기를 사용하여 모션 잠재 표현을 생성한다.
물리 Plausibility Optimization(PP-Opt) 모듈을 도입하여 양방향 인터페이스를 제공한다: 순방향 최적화는 물리 기반 보상으로 추종자를 개선하고, 역방향 정제는 고품질의 정제 데이터를 사용해 모션 생성기를 업데이트한다.
시뮬레이션에서 교사 정책을 훈련하고 DAgger를 통해 잠재 기반 제어를 배포 가능한 학습자 정책으로 증류한다.
모션 품질 제어를 적용하여 고품질 정제 데이터셋을 구성하고 모션 생성기를 미세 조정한다.
폐쇄 루프 작동으로: 생성 → 실행 → 필터링 → 재생성, 물리적으로 그럴듯한 잠재 분포를 가능하게 한다.
Unitree G1 하드웨어와 IsaacLab 및 MuJoCo 시뮬레이터에서의 시뮬레-현실 배치를 평가한다.

실험 결과

연구 질문

RQ1실행 중 명시적 리타겟 참조를 완전히 잠재 기반 추론 파이프라인으로 대체할 수 있는가?
RQ2PP-Opt 내의 물리 기반 최적화가 다이나믹스와 접촉 제약 하에서 모션 생성과 추적을 모두 향상시키는가?
RQ3더 이상 수익이 감소하기 전에 몇 차례의 PP-Opt 정제 라운드가 성능 향상을 가져오는가?
RQ4암시적 잠재 조건화가 안정적이고 물리적으로 타당한 보행을 달성하는 데 명시적 리타ARGET링보다 우수한가?

주요 결과

PP-Opt는 생성 모션의 비물리적 아티팩트를 줄인다: 침투(penetration) 0.042에서 0.000으로 감소; 부유(floating) 1.744에서 0.713으로 감소; 미끄러짐(skating) 0.064에서 0.061으로 감소.
추적은 MLD+PP-Opt로 IsaacLab에서 더 높은 성공률과 더 낮은 오차를 보인다(Succ 0.96 vs 0.94; E_mpJPE 0.11 vs 0.14; E_mpKPE 0.09 vs 0.11) 및 MuJoCo에서(Succ 0.71 vs 0.63; E_mpJPE 0.21 vs 0.26; E_mpKPE 0.20 vs 0.24).
반복적 PP-Opt 라운드는 누적 이득을 낸다(한 라운드에서 세 라운드까지: Top-1 RTOP-1 0.531에서 0.537로 증가; FID 0.462에서 0.454로 감소; 침투 0.000 유지; 부유/미끄러짐은 점진적으로 개선).
암시적 잠재 기반 제어가 IsaacLab과 MuJoCo 시뮬레이션에서 명시적 리타겟팅보다 우수하다(암시적: Succ 0.96/0.71 vs 명시적: 0.91/0.62; E_mpJPE 0.11/0.21 vs 0.23/0.26).
PP-Opt를 활용한 폐쇄 루프 생성→실행→필터링→재생성 패러다임은 텍스트 안내 휴머노이드 보행의 배포 가능한 견고한 경로를 만든다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.