QUICK REVIEW

[论文解读] Single Motion Diffusion

Sigal Raab, Inbal Leibovitch|arXiv (Cornell University)|Feb 12, 2023

Generative Adversarial Networks and Image Synthesis被引用 12

一句话总结

SinMDM 是一个轻量级的基于扩散的模型，它从具有任意拓扑结构的单一运动序列中学习内部模体，并且在推理时无需重新训练就能合成长时间、多样且忠实的动作。

ABSTRACT

Synthesizing realistic animations of humans, animals, and even imaginary creatures, has long been a goal for artists and computer graphics professionals. Compared to the imaging domain, which is rich with large available datasets, the number of data instances for the motion domain is limited, particularly for the animation of animals and exotic creatures (e.g., dragons), which have unique skeletons and motion patterns. In this work, we present a Single Motion Diffusion Model, dubbed SinMDM, a model designed to learn the internal motifs of a single motion sequence with arbitrary topology and synthesize motions of arbitrary length that are faithful to them. We harness the power of diffusion models and present a denoising network explicitly designed for the task of learning from a single input motion. SinMDM is designed to be a lightweight architecture, which avoids overfitting by using a shallow network with local attention layers that narrow the receptive field and encourage motion diversity. SinMDM can be applied in various contexts, including spatial and temporal in-betweening, motion expansion, style transfer, and crowd animation. Our results show that SinMDM outperforms existing methods both in quality and time-space efficiency. Moreover, while current approaches require additional training for different applications, our work facilitates these applications at inference time. Our code and trained models are available at https://sinmdm.github.io/SinMDM-page.

研究动机与目标

动机与解决非人类或高度定制骨架的运动数据稀缺问题。
提出一种基于扩散的框架，从单一运动序列学习，在任意拓扑结构下合成变长且具有模体忠实性的动作。
开发一个轻量级架构，具有窄的感受野，以防止过拟合并实现高效推理和多样输出。
使推理时的应用成为可能，如动作组合、和谐化、风格迁移、长序列生成以及群体动画等，无需额外训练。

提出的方法

采用扩散概率模型（DDPM），训练以从带噪声的版本 xt 预测原始运动 x0（无条件合成）。
将运动表示为动态（D）和静态（S）特征，并将学习重点放在具有固定骨架拓扑和骨长的动力学上。
使用一个带有 QnA 局部注意力增强的浅层 UNet 架构，以实现窄的时间感受野并避免过拟合。
使用简单的 L_simple 损失进行训练：E_t [ || x0 - p_theta(x_t, t) ||^2 ]。
通过从纯噪声 xT 开始的迭代去噪并重新加噪来生成 x_{t-1}，直到产生 x0。
在推理时应用以支持多种应用（动作组合、和谐化、风格迁移、长序列生成、群体动画）无须重新训练。

实验结果

研究问题

RQ1Can SinMDM learn and retain core motion motifs from a single motion sequence with arbitrary skeletal topology?
RQ2Is a shallow UNet with local QnA attention sufficient to model single-motion diffusion without overfitting and with competitive quality and efficiency?
RQ3Can inference-time applications (e.g., motion composition, harmonization, style transfer, long-sequence generation, crowd animation) be achieved without additional training?
RQ4How does SinMDM perform on diverse datasets (Mixamo, HumanML3D) compared to single-motion baselines like Ganimator?

主要发现

Coverage ↑	Global Div. ↑	Local Div. ↑	Inter Div. ↑	Intra Div. Diff. ↓	#Param. (M) ↓	#Iter. (K) ↓	Iter. Time (s) ↓	Tot. Time (h) ↓	Harmon. Mean ↑
Ganimator	94.3	1.24	1.17	0.09	0.13	21.7	60 (15 × 4)	0.36	6.0	-0.22
SinMDM (Ours)	94.3	1.42	1.00	0.13	0.03	5.26	60	0.09	1.5	0.85

SinMDM outperforms the prior single-motion method (Ganimator) across multiple metrics on the Mixamo benchmark, particularly in harmonic mean, while using fewer parameters and iterations.
On Mixamo, SinMDM achieves identical Coverage and better Global Diversity and Local Diversity, with significantly reduced parameters and total time.
On the Gangnam-style motion, SinMDM achieves higher Inter Diversity and comparable or better Local Diversity, while maintaining strong Coverage.
SinMDM supports long-motion generation and crowd animation without retraining, thanks to its small receptive field and diffusion-based framework.
The model is efficient enough to train on a single mid-range GPU and supports inference-time specialization for diverse applications.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。