QUICK REVIEW

[论文解读] Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis

Ye Yuan, Kris Kitani|arXiv (Cornell University)|Jun 12, 2020

Human Pose and Action Recognition参考文献 58被引用 34

一句话总结

RFC 通过可学习的残余力增强人形控制，以克服动力学不匹配，从而实现灵活的动作模仿（例如芭蕾）以及通过双策略框架实现多模态的长期运动。

ABSTRACT

Reinforcement learning has shown great promise for synthesizing realistic human behaviors by learning humanoid control policies from motion capture data. However, it is still very challenging to reproduce sophisticated human skills like ballet dance, or to stably imitate long-term human behaviors with complex transitions. The main difficulty lies in the dynamics mismatch between the humanoid model and real humans. That is, motions of real humans may not be physically possible for the humanoid model. To overcome the dynamics mismatch, we propose a novel approach, residual force control (RFC), that augments a humanoid control policy by adding external residual forces into the action space. During training, the RFC-based policy learns to apply residual forces to the humanoid to compensate for the dynamics mismatch and better imitate the reference motion. Experiments on a wide range of dynamic motions demonstrate that our approach outperforms state-of-the-art methods in terms of convergence speed and the quality of learned motions. Notably, we showcase a physics-based virtual character empowered by RFC that can perform highly agile ballet dance moves such as pirouette, arabesque and jeté. Furthermore, we propose a dual-policy control framework, where a kinematic policy and an RFC-based policy work in tandem to synthesize multi-modal infinite-horizon human motions without any task guidance or user input. Our approach is the first humanoid control method that successfully learns from a large-scale human motion dataset (Human3.6M) and generates diverse long-term motions. Code and videos are available at https://www.ye-yuan.com/rfc.

研究动机与目标

解决人形模型与真实人类之间的动力学不匹配，以改善运动模仿效果。
实现先前基于物理的方法难以达到的高度灵活动作（例如芭蕾舞）。
开发一个用于多模态、长期运动合成且无需任务引导或用户输入的双策略框架。
利用大规模运动数据集（Human3.6M）学习多样化的长期运动。

提出的方法

引入残余力控制（RFC），在动作空间中给人形策略增加外部残余力。
将 RFC 分为 RFC-Explicit（带接触点的显式残余力）或 RFC-Implicit（总残余关节力矩），并与一个复合策略耦合。
用包含残余项的扩展运动方程来建模动力学（显式为 Eq. 2，隐式为 Eq. 4）。
通过奖励项对残余力进行正则化，使修改后的动力学与原始物理约束保持接近（Eq. 3，Eq. 5）。
采用双策略控制框架：一种运动学策略（CVAE）预测未来，一种基于 RFC 的策略模仿这些未来，以实现物理上可行的运动。
在物理仿真器（MuJoCo）中使用 PPO 进行训练，配合 PD 控制器和来自动作捕捉的参考运动。

实验结果

研究问题

RQ1残余力是否能够弥补动力学不匹配，从而实现高度灵活动作的模仿？
RQ2RFC-Explicit 与 RFC-Implicit 在学习效率与运动质量上有何差异？
RQ3是否存在一个双策略框架，在没有任务引导或用户输入的情况下生成多模态的长期运动？
RQ4从大型运动数据集（如 Human3.6M）学习是否能实现多样化的长期运动合成？

主要发现

RFC 相较于先进的 DeepMimic 在灵活运动上的收敛更快，且生成的动作质量更高。
RFC 在仿真中实现了高度灵活的芭蕾动作，如旋转、 Arabesque、jeté 等。
RFC-Explicit 与 RFC-Implicit 在模仿性能上表现相当，RFC-Implicit 提供计算效率优势。
双策略框架实现了在没有任务引导或用户输入的情况下合成稳定的多模态长期运动。
该方法从 Human3.6M 学习，能够生成超出短参考片段的多样化长期运动。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。