Skip to main content
QUICK REVIEW

[论文解读] Task-Specified Compliance Bounds for Humanoids via Lipschitz-Constrained Policies

Zewen He, Yoshihiko Nakamura|arXiv (Cornell University)|Mar 17, 2026
Robotic Locomotion and Control被引用 0
一句话总结

论文提出了各向异性 Lipschitz 约束策略(ALCP),将任务空间刚度上界映射到状态相关、方向感知的策略雅可比矩阵约束,在强化学习训练中强制执行,以实现任务指定的顺应性和可控的人形机器人行走稳定性。

ABSTRACT

Reinforcement learning (RL) has demonstrated substantial potential for humanoid bipedal locomotion and the control of complex motions. To cope with oscillations and impacts induced by environmental interactions, compliant control is widely regarded as an effective remedy. However, the model-free nature of RL makes it difficult to impose task-specified and quantitatively verifiable compliance objectives, and classical model-based stiffness designs are not directly applicable. Lipschitz-Constrained Policies (LCP), which regularize the local sensitivity of a policy via gradient penalties, have recently been used to smooth humanoid motions. Nevertheless, existing LCP-based methods typically employ a single scalar Lipschitz budget and lack an explicit connection to physically meaningful compliance specifications in real-world systems. In this study, we propose an anisotropic Lipschitz-constrained policy (ALCP) that maps a task-space stiffness upper bound to a state-dependent Lipschitz-style constraint on the policy Jacobian. The resulting constraint is enforced during RL training via a hinge-squared spectral-norm penalty, preserving physical interpretability while enabling direction-dependent compliance. Experiments on humanoid robots show that ALCP improves locomotion stability and impact robustness, while reducing oscillations and energy usage.

研究动机与目标

  • Motivate compliant control for humanoids in reinforcement learning beyond ad hoc penalties.
  • Map a prescribed task-space stiffness upper bound to a state-dependent, anisotropic Lipschitz-style constraint on the policy Jacobian.
  • Provide a physically interpretable framework to observe and regulate the effective joint stiffness induced by an RL policy.
  • Demonstrate that ALCP yields tunable compliance and enhanced stability in simulation and real-robot experiments.

提出的方法

  • Formulate an anisotropic Lipschitz constraint on the policy Jacobian using a budget matrix K_LCP.
  • Define a policy-induced equivalent joint stiffness K_eq(o) and derive its relation to the policy Jacobian via J_pi(o).
  • Map task-space stiffness upper-bounds K_x^max to joint-space stiffness budgets K_q^max through kinematic relations and a stiffness-compliance framework.
  • Introduce a hinge-squared spectral-norm penalty that enforces the anisotropic LCP during RL training as a soft constraint.
  • Derive an ALCP training objective L_total = L_RL + lambda_aniso * R_aniso, where R_aniso penalizes violations of the stiffness constraint.
  • Use a finite-state machine to handle different contact phases during training and evaluation.

实验结果

研究问题

  • RQ1How can task-space stiffness upper-bounds be enforced within RL training to yield interpretable and verifiable compliance?
  • RQ2Can anisotropic (direction-dependent) Lipschitz budgets provide more flexible and physically meaningful control than scalar LCP in humanoid policies?
  • RQ3What are the stability, compliance, and energy trade-offs when enforcing ALCP in simulation and on a real humanoid robot?
  • RQ4Does ALCP enable controllable, task-specified compliance in CoM and limb interactions during locomotion and manipulation?

主要发现

  • ALCP enables task-specified compliance by translating task-space stiffness constraints into anisotropic policy Jacobian budgets, improving interpretability.
  • Compared with scalar LCP, ALCP provides direction-dependent control that can yield tunable CoM and hand compliance while maintaining balance.
  • In simulation, ALCP bounds the directional quadratic-form budget under stepping and standing tasks, reducing high-frequency activity and enabling controlled energy/effort trade-offs.
  • Hardware experiments on a humanoid platform show reduced hand oscillations and faster settling under external loads when using SILC-based policies derived from task stiffness constraints.
  • Across walking scenarios, ALCP maintains stability with bounded policy sensitivity, achieving compliant yet robust locomotion.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。