QUICK REVIEW

[论文解读] LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations

Yutang Lin, Jieming Cui|arXiv (Cornell University)|Feb 25, 2026

Robot Manipulation and Learning被引用 0

一句话总结

LessMimic 使用基于距离场（DF）、无参考框架，使单一策略能够在多样对象几何形状下学习长时序的人形交互，DF派生的几何线索由VAE编码，并通过行为克隆、对抗性交互先验和视觉蒸馏进行训练。它在推断阶段不需要运动参考或MoCap，实现了稳健的泛化与技能组合。

ABSTRACT

Humanoid robots that autonomously interact with physical environments over extended horizons represent a central goal of embodied intelligence. Existing approaches rely on reference motions or task-specific rewards, tightly coupling policies to particular object geometries and precluding multi-skill generalization within a single framework. A unified interaction representation enabling reference-free inference, geometric generalization, and long-horizon skill composition within one policy remains an open challenge. Here we show that Distance Field (DF) provides such a representation: LessMimic conditions a single whole-body policy on DF-derived geometric cues--surface distances, gradients, and velocity decompositions--removing the need for motion references, with interaction latents encoded via a Variational Auto-Encoder (VAE) and post-trained using Adversarial Interaction Priors (AIP) under Reinforcement Learning (RL). Through DAgger-style distillation that aligns DF latents with egocentric depth features, LessMimic further transfers seamlessly to vision-only deployment without motion capture (MoCap) infrastructure. A single LessMimic policy achieves 80--100% success across object scales from 0.4x to 1.6x on PickUp and SitStand where baselines degrade sharply, attains 62.1% success on 5 task instances trajectories, and remains viable up to 40 sequentially composed tasks. By grounding interaction in local geometry rather than demonstrations, LessMimic offers a scalable path toward humanoid robots that generalize, compose skills, and recover from failures in unstructured environments.

研究动机与目标

需要一个统一的交互表示以在对象几何形状不同的情况下实现泛化并支持长时、多技能的人形交互。
提出基于 DF 的交互表示，为接触感知控制提供表面距离、梯度和速度分解信号。
开发三阶段训练管线（行为克隆、AIP 指导的 RL、视觉蒸馏），使推理时不需要运动参考或 MoCap。
证明在未见形状/尺度的情况下的泛化、故障恢复，以及在单一策略中对任务的逐步组合。

提出的方法

在 DoF 层面用距离场（DF）表示局部几何形状和交互动力学，编码每个连杆的 DF 距离、梯度和速度分量（法向和切向），将其汇编成在时间窗上的交互特征 I_t。
通过变分自编码器（VAE）把 I_t 编码成紧凑的潜在向量 z_t，产生几何感知的交互信号。
通过模仿教师跟踪重定向运动并使用 DAgger 来缓解协变量偏移，基于行为克隆训练单一全身策略 π_base。
利用对抗性交互先验（AIP）进行强化学习微调 π_base，使用在 z_t 上的判别器对几何有效性在随机对象几何下进行正则化。
通过 DAgger 风格的视觉-运动蒸馏把完整策略蒸馏成具备视觉感知的策略（π_vis），以便在不依赖 MoCap 的情况下部署，使用自我视角深度特征。

实验结果

研究问题

RQ1统一的 DF 基表示能否提供几何不依赖的线索，使其在不同对象形状和尺度下实现长时的人形交互吗？
RQ2三阶段训练管线（行为克隆、AIP 指导的 RL、视觉蒸馏）是否能产生一个可进行无参考推理且技能可以无缝组合的单一策略？
RQ3在 DF 条件化下，故障恢复和对未见几何与异构任务序列的鲁棒泛化程度有多高？
RQ4在长时域（如连续进行40个任务实例）下，且不使用重置或规划器，该方法的扩展性如何？

主要发现

一个带有 DF 条件的单一策略在 0.4× 到 1.6× 的对象尺度上实现 PickUp 和 SitStand 的 80%~100% 成功率，优于基线。
在长时序轨迹上，该方法在5个任务序列的成功率达到 62.1%，并且在最多40个连续任务实例时仍然可行。
基于 DF 的局部几何线索（距离、梯度和速度分解）在不重新训练的情况下对未见形状和尺度具有鲁棒泛化能力。
三阶段管线实现了无参考部署：行为克隆提供稳健初始化，AIP 指导的 RL 提供几何泛化，视觉蒸馏实现无 MoCap 的部署。
该方法通过在扰动后从更新的对象位置重新启动交互，支持在线故障恢复。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。