QUICK REVIEW

[论文解读] Perpetual Motion: Generating Unbounded Human Motion

Yan Zhang, Michael J. Black|arXiv (Cornell University)|Jul 27, 2020

Human Pose and Action Recognition参考文献 30被引用 19

一句话总结

该论文提出了一种双流变分RNN模型，通过在潜在空间中引入一种新型KL散度项并施加重尾惩罚，实现对全局轨迹和身体姿态的交叉条件建模，从而从单一初始姿态生成持续、非确定性的运动序列。该方法在长时间（如10分钟）内生成了逼真、多样且时间上连贯的运动序列，且未出现后验坍缩现象，在自然度与多样性评估中优于当前最先进基线方法。

ABSTRACT

The modeling of human motion using machine learning methods has been widely studied. In essence it is a time-series modeling problem involving predicting how a person will move in the future given how they moved in the past. Existing methods, however, typically have a short time horizon, predicting a only few frames to a few seconds of human motion. Here we focus on long-term prediction; that is, generating long sequences (potentially infinite) of human motion that is plausible. Furthermore, we do not rely on a long sequence of input motion for conditioning, but rather, can predict how someone will move from as little as a single pose. Such a model has many uses in graphics (video games and crowd animation) and vision (as a prior for human motion estimation or for dataset creation). To address this problem, we propose a model to generate non-deterministic, extit{ever-changing}, perpetual human motion, in which the global trajectory and the body pose are cross-conditioned. We introduce a novel KL-divergence term with an implicit, unknown, prior. We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency. We perform systematic experiments to verify its effectiveness and find that it is superior to baseline methods.

研究动机与目标

从单一静态姿态生成长期、逼真且多样的人体运动序列，无需依赖长输入序列或外部控制。
解决在持续生成中建模人体运动内在随机性与时间依赖性的挑战。
开发一种具有隐式时间结构的潜在先验，避免训练过程中的后验坍缩。
建立系统化的评估流程，用于评估表征能力、运动频率、多样性及感知自然度。
通过生成持续变化且合理的运动，为动画、视觉及合成数据集构建提供支持。

提出的方法

采用双流变分自编码器并结合RNN，使全局位移与身体姿态在共享潜在空间中相互交叉条件化。
模型在推理过程中采用自回归生成机制，通过从推理后验分布中采样随机潜在变量进行生成。
提出一种新型KL散度项，对白噪声高斯过程的KL散度施加Charbonnier惩罚，隐式建模潜在序列中的时间依赖性。
该隐式先验偏离标准正态分布，从而在保持有效证据下界（ELBO）的同时，实现更丰富的时序动态。
模型在动作捕捉数据（如MPI-Mosh、HumanEva）上端到端进行训练，无需动作标签或用户输入。
推理阶段，模型通过逐帧采样潜在变量生成序列，实现持续且不重复的运动。

实验结果

研究问题

RQ1深度生成模型能否从单一初始姿态生成在任意长时长内持续、非重复且逼真的运动序列？
RQ2在变分自编码器的潜在空间中，如何在无显式先验的情况下有效建模时间依赖性？
RQ3与标准VAE相比，所提出的KL散度正则化在多大程度上改善了后验坍缩问题并提升了运动多样性？
RQ4在自然度、多样性和频率特性方面，该模型性能与当前最先进方法相比如何？
RQ5在连续生成10分钟后，模型生成的运动是否仍保持感知上的合理性？

主要发现

所提方法成功生成了72,000帧（10分钟）的运动，肢体姿态持续变化且保持合理，充分展示了持续运动能力。
在Amazon Mechanical Turk的感知评估中，该模型在HumanEva数据集上的平均自然度得分为3.44–3.47（满分5分），在MPI-Mosh数据集上得分为3.31，优于所有基线模型。
该模型在多样性方面表现最佳，从同一初始条件出发的三次运行中，标准差为0.15–0.22，表明其具有强随机性与非重复性。
新型KL散度项有效防止了后验坍缩，表现为训练过程稳定且生成质量高。
在所有评估指标（包括自然度、多样性与频率一致性）上，该模型均优于两项SOTA基线方法（VQ-α Res与S-Res）。
结果表明，模型性能更受训练数据质量（如ACCAD与CMU）的影响，而不仅取决于数据集规模。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。