QUICK REVIEW

[论文解读] SFV: Reinforcement Learning of Physical Skills from Videos

Xue Bin Peng, Angjoo Kanazawa|arXiv (Cornell University)|Oct 8, 2018

Human Motion and Animation被引用 40

一句话总结

SFV 直接从单目视频中通过结合姿态估计、运动重建和强化学习，学习动态、物理上可行的用于仿真角色的技能，并可重新定向到不同形态和环境。

ABSTRACT

Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented environments and actors. In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV). Our approach, based on deep pose estimation and deep reinforcement learning, allows data-driven animation to leverage the abundance of publicly available video clips from the web, such as those from YouTube. This has the potential to enable fast and easy design of character controllers simply by querying for video recordings of the desired behavior. The resulting controllers are robust to perturbations, can be adapted to new settings, can perform basic object interactions, and can be retargeted to new morphologies via reinforcement learning. We further demonstrate that our method can predict potential human motions from still images, by forward simulation of learned controllers initialized from the observed pose. Our framework is able to learn a broad range of dynamic skills, including locomotion, acrobatics, and martial arts.

研究动机与目标

以大量视频数据而非昂贵的动作捕捉数据来推动数据驱动的角色动画。
开发一个管线，将视频示范转换为仿真角色可物理上合理的参考动作。
通过强化学习实现对这些参考在基于物理的环境中进行鲁棒策略学习以进行模仿。
引入自适应状态初始化以改进来自低保真视频派生参考的长时序模仿。
演示对不同形态的重新定向以及从静态图像实现运动完成的潜力。

提出的方法

将2D/3D姿态估计（OpenPose与HMR）与自适应运动重建阶段结合起来，该阶段在潜在空间上优化轨迹以产生连贯的3D参考动作。
通过在潜在空间zt中优化来重建参考动作，以最小化2D重投影、3D一致性和时间平滑损失的加权和。
通过强化学习（PPO变体）训练策略π(a|s)，以在基于物理的仿真器中模仿重建的参考动作。
引入自适应状态初始化（ASI），其中第二个代理提出初始状态以在长时间模仿过程中改进探索和课程设计。
使用包含姿态、速度、末端执行器和 center-of-mose 奖励的复合奖励，以使仿真动作与参考对齐，同时保持稳定性。
通过为静态图像选择最匹配的参考动作并用相应的策略进行前向仿真，演示运动完成。

实验结果

研究问题

RQ1单目视频是否能够提供足够的运动数据来学习用于物理模拟角色的多样化、动态技能？
RQ2如何减轻姿态估计误差和非物理伪像，以在物理引擎中实现可靠模仿？
RQ3自适应状态初始化是否在模仿低保真视频派生的参考动作时提高学习效率和质量？
RQ4在保持技能保真度的同时，学习到的控制器在多大程度上可以重新定向到不同形态和环境？
RQ5一个学习控制器库能否实现从单一静态图像的运动完成？

主要发现

该框架能够从视频再现包括移动、特技和武术在内的广泛动态技能。
在潜在姿态空间中的运动重建相对于直接逐帧姿态序列，在参考质量和模仿性能上有所提升。
ASI通过调整初始状态分布来改进长时域模仿，从而为复杂动作提供更好的学习曲线。
通过SFV学习的策略对扰动具有鲁棒性，且可以重新定向到不同形态和环境。
该方法开创了一种新型的基于物理的运动完成应用，能够从单一静态图像预测未来动作。
系统证明能够将视频派生的参考可靠地转化为仿真中的高保真、物理上合理的动作。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。