QUICK REVIEW

[论文解读] RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids

Xichen Yuan, Zhe Li|arXiv (Cornell University)|Mar 18, 2026

Human Motion and Animation被引用 0

一句话总结

RoboForge 提出一种双向、潜在驱动的框架，将文本到运动生成与基于物理的优化连接起来，以生成物理上可信、无需重定向的 humanoid 行走，在仿真和真实硬件上均提升生成质量与跟踪稳定性。

ABSTRACT

While generative models have become effective at producing human-like motions from text, transferring these motions to humanoid robots for physical execution remains challenging. Existing pipelines are often limited by retargeting, where kinematic quality is undermined by physical infeasibility, contact-transition errors, and the high cost of real-world dynamical data. We present a unified latent-driven framework that bridges natural language and whole-body humanoid locomotion through a retarget-free, physics-optimized pipeline. Rather than treating generation and control as separate stages, our key insight is to couple them bidirectionally under physical constraints.We introduce a Physical Plausibility Optimization (PP-Opt) module as the coupling interface. In the forward direction, PP-Opt refines a teacher-student distillation policy with a plausibility-centric reward to suppress artifacts such as floating, skating, and penetration. In the backward direction, it converts reward-optimized simulation rollouts into high-quality explicit motion data, which is used to fine-tune the motion generator toward a more physically plausible latent distribution. This bidirectional design forms a self-improving cycle: the generator learns a physically grounded latent space, while the controller learns to execute latent-conditioned behaviors with dynamical integrity.Extensive experiments on the Unitree G1 humanoid show that our bidirectional optimization improves tracking accuracy and success rates. Across IsaacLab and MuJoCo, the implicit latent-driven pipeline consistently outperforms conventional explicit retargeting baselines in both precision and stability. By coupling diffusion-based motion generation with physical plausibility optimization, our framework provides a practical path toward deployable text-guided humanoid intelligence.

研究动机与目标

弥合 humanoid 机器人文本到运动生成与物理执行之间的差距。
通过使用潜在且无需重定向的控制接口，消除显式重定向失败。
引入 PP-Opt，在物理约束下联合优化运动生成与跟踪。
在仿真和 Unitree G1 硬件上证明稳定性与物理可信性的提升。
展示迭代的 PP-Opt 精细化在生成质量与可执行性上的累计收益。

提出的方法

使用以文本提示为条件的潜在空间扩散式运动生成器来产生运动潜在变量。
引入物理可信优化（PP-Opt）模块，提供双向接口：前向优化在基于物理奖励的条件下改进跟踪器，向后细化利用高质量的 refined 数据更新运动生成器。
在仿真中训练教师策略并通过 DAgger 蒸馏为可部署的学生策略，以潜在驱动控制。
应用运动质量控制来筛选高质量的 refinement 数据集并微调运动生成器。
在闭环中运行：生成 → 执行 → 过滤 → 重新生成，形成物理可行的潜在分布。
在 Unitree G1、IsaacLab 和 MuJoCo 仿真器上评估从仿真到实际部署。

实验结果

研究问题

RQ1部署阶段是否可以用纯隐式的潜在驱动推理管道替代显式重定向的参考？
RQ2PP-Opt 内的物理优化是否在动力学与接触约束下同时提升运动生成与跟踪？
RQ3多少轮 PP-Opt 精细化能够在收益递减前带来性能提升？
RQ4隐式潜在条件是否优于显式重定向，在实现稳定、物理可信的步态方面？

主要发现

PP-Opt 能减少生成运动中的非物理伪影，如穿透、悬浮和滑步等（穿透从 0.042 降至 0.000；悬浮从 1.744 降至 0.713；滑步从 0.064 降至 0.061）。
在 IsaacLab 的训练跟踪中结合 MLD+PP-Opt 显示更高的成功率、并且误差更低：IsaacLab Succ 0.96 vs 0.94；E_mpJPE 0.11 vs 0.14；E_mpKPE 0.09 vs 0.11；在 MuJoCo 上 Succ 0.71 vs 0.63；E_mpJPE 0.21 vs 0.26；E_mpKPE 0.20 vs 0.24。
迭代的 PP-Opt 精细化轮次带来累积收益（从 One-Round 到 Three-Round：Top-1 RTOP-1 从 0.531 提升到 0.537；FID 从 0.462 提升到 0.454；穿透保持为 0.000；悬浮/滑步逐步改善）。
隐式潜在驱动控制在 IsaacLab 与 MuJoCo 仿真中均优于显式重定向（隐式：Succ 0.96/0.71；显式：0.91/0.62；E_mpJPE 0.11/0.21；0.23/0.26）。
在 PP-Opt 的闭环 generate→execute→filter→re-generate 范式下，形成可部署、文本引导的人形 locomotion 的鲁棒路径。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。