Skip to main content
QUICK REVIEW

[论文解读] Human Motion Diffusion as a Generative Prior

Yonatan Shafir, Guy Tevet|arXiv (Cornell University)|Mar 2, 2023
Human Motion and Animation被引用 33
一句话总结

本文提出三种基于扩散先验的运动组合方法——顺序、并行和模型组合(DiffusionBlending)——以利用预训练的 Motion Diffusion Model (MDM) 实现长距离、多人以及可控的人体动作生成。

ABSTRACT

Recent work has demonstrated the significant potential of denoising diffusion models for generating human motion, including text-to-motion capabilities. However, these methods are restricted by the paucity of annotated motion data, a focus on single-person motions, and a lack of detailed control. In this paper, we introduce three forms of composition based on diffusion priors: sequential, parallel, and model composition. Using sequential composition, we tackle the challenge of long sequence generation. We introduce DoubleTake, an inference-time method with which we generate long animations consisting of sequences of prompted intervals and their transitions, using a prior trained only for short clips. Using parallel composition, we show promising steps toward two-person generation. Beginning with two fixed priors as well as a few two-person training examples, we learn a slim communication block, ComMDM, to coordinate interaction between the two resulting motions. Lastly, using model composition, we first train individual priors to complete motions that realize a prescribed motion for a given joint. We then introduce DiffusionBlending, an interpolation mechanism to effectively blend several such models to enable flexible and efficient fine-grained joint and trajectory-level control and editing. We evaluate the composition methods using an off-the-shelf motion diffusion model, and further compare the results to dedicated models trained for these specific tasks.

研究动机与目标

  • 通过利用预训练的扩散先验 (MDM) 为新的组合任务解决人类运动数据的局限性。
  • 通过顺序组合 (DoubleTake) 在不对长数据重新训练的情况下实现长序列生成。
  • 通过在固定先验之间学习一个精简的通信块(ComMDM)实现少样本两人运动生成。
  • 通过模型组合(DiffusionBlending)和定向微调提供灵活的、细粒度的控制。

提出的方法

  • 将固定的预训练 Motion Diffusion Model (MDM) 作为新任务的先验。
  • 提出 DoubleTake 用于长序列:两阶段推理,通过握手耦合相邻区间并细化过渡。
  • 引入 ComMDM,一个精简的通信块,在少样本设置下协调两个固定先验以实现两人运动。
  • 通过在扩散过程中屏蔽控制特征来强制遵从(单控制微调),对轨迹和关节控制对 MDM 进行微调。
  • 引入 DiffusionBlending,通过广义的分类器自由引导来组合多个条件模型以实现跨关节控制。
Figure 1. We suggest three novel motion composition methods, all based on the recent Motion Diffusion Model (MDM). (Left) Sequential composition generating an arbitrary long motion with text control over each time interval. (Middle) Parallel composition generating two-person motion from text. A diff
Figure 1. We suggest three novel motion composition methods, all based on the recent Motion Diffusion Model (MDM). (Left) Sequential composition generating an arbitrary long motion with text control over each time interval. (Middle) Parallel composition generating two-person motion from text. A diff

实验结果

研究问题

  • RQ1是否可以在不对长数据重新训练的情况下,重新利用预训练的运动扩散先验来生成任意长度的动作?
  • RQ2通过在固定先验之间添加一个协调模块,是否有可能仅凭少量训练样本生成令人信服的两人互动?
  • RQ3如何混合或微调基于扩散的控制器,以实现对运动轨迹的细粒度、关节级控制?
  • RQ4模型组合技术在特定运动任务上是否比专用模型表现更优或相当?

主要发现

  • DoubleTake 通过组合短片先验并提供每区间控制,使10分钟级的连贯动作成为可能。
  • ComMDM 能够协调两个固定先验,在少样本训练下产生两人运动,在前缀完成和文本引导生成(用户研究)方面优于基线。
  • 经过微调的控制和 DiffusionBlending 使控制信号(如根部/root 和手部 hand)的跨组合成为可能,并提升与目标轨迹和关节的对齐。
  • 在基准测试(BABEL、HumanML3D、3DPW)中,所提出的方法在多项指标(R-precision、FID、多样性等)上要么超越要么接近专用任务模型。
  • 该方法证明了利用扩散先验实现长时域、多人物以及可控运动生成的零-shot 或少样本可行性。
Figure 2. Soft blending overview. We allow b frames long linear masking between $\mathbf{M_{hard}}$ to $\mathbf{M_{soft}}$ such that during the Second take at every denoising step part of the originally generated motion (suffix or prefix) going through refinement to fit the transition.
Figure 2. Soft blending overview. We allow b frames long linear masking between $\mathbf{M_{hard}}$ to $\mathbf{M_{soft}}$ such that during the Second take at every denoising step part of the originally generated motion (suffix or prefix) going through refinement to fit the transition.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。