QUICK REVIEW

[論文レビュー] Single Motion Diffusion

Sigal Raab, Inbal Leibovitch|arXiv (Cornell University)|Feb 12, 2023

Generative Adversarial Networks and Image Synthesis被引用数 12

ひとこと要約

SinMDMは、任意のトポロジーを持つ単一のモーション列から内部モチーフを学習し、再訓練なしで推論時に長く、多様で忠実なモーションを合成できる軽量な拡散ベースのモデルです。

ABSTRACT

Synthesizing realistic animations of humans, animals, and even imaginary creatures, has long been a goal for artists and computer graphics professionals. Compared to the imaging domain, which is rich with large available datasets, the number of data instances for the motion domain is limited, particularly for the animation of animals and exotic creatures (e.g., dragons), which have unique skeletons and motion patterns. In this work, we present a Single Motion Diffusion Model, dubbed SinMDM, a model designed to learn the internal motifs of a single motion sequence with arbitrary topology and synthesize motions of arbitrary length that are faithful to them. We harness the power of diffusion models and present a denoising network explicitly designed for the task of learning from a single input motion. SinMDM is designed to be a lightweight architecture, which avoids overfitting by using a shallow network with local attention layers that narrow the receptive field and encourage motion diversity. SinMDM can be applied in various contexts, including spatial and temporal in-betweening, motion expansion, style transfer, and crowd animation. Our results show that SinMDM outperforms existing methods both in quality and time-space efficiency. Moreover, while current approaches require additional training for different applications, our work facilitates these applications at inference time. Our code and trained models are available at https://sinmdm.github.io/SinMDM-page.

研究の動機と目的

非人間または高度にカスタマイズされたスケルトンのためのモーションデータの不足を動機づけ、それに対処する。
単一のモーションシーケンスから学習して、任意のトポロジーにわたって可変長・モチーフ忠実なモーションを合成する拡散ベースのフレームワークを提案。
過学習を防ぎ、効率的な推論と多様な出力を可能にする狭い受容野を持つ軽量アーキテクチャを開発。
追加の訓練なしで、推論時アプリケーション（モーション合成、ハーモナイゼーション、スタイル転送、長シーケンス生成、群衆アニメーション）を有効化。

提案手法

元のモーション x0 をノイズ化されたバージョン xt から予測するよう訓練されたノイズ除去拡散確率モデル（DDPM）を採用（無条件合成）。
モーションを動的（D）および静的（S）特徴として表現し、固定スケルトンのトポロジーと骨長を前提にダイナミクスの学習に焦点を当てる。
狭い時間的受容野を強制し過学習を避けるため、QnAローカルアテンションを組み込んだ浅いUNetアーキテクチャを用いる。
簡易な L_simple loss を用いて訓練: E_t [ || x0 - p_theta(x_t, t) ||^2 ].
純粋なノイズ xT から反復的なデノイズを行い、x_{t-1} を生成することで x0 を生成する。
推論時に適用して、再訓練なしで複数のアプリケーション（モーション合成、ハーモナイゼーション、スタイル転送、長シーケンス生成、群アニメーション）をサポート。

実験結果

リサーチクエスチョン

RQ1SinMDMは、単一のモーションシーケンスから任意の骨格トポロジーを持つコアモーションモチーフを学習し保持できるか？
RQ2局所QnA注意を持つ浅いUNetは、過学習を防ぎつつ競争力のある品質と効率で単一モーション拡散をモデル化するのに十分か？
RQ3推論時アプリケーション（例：モーション合成、ハーモナイゼーション、スタイル転送、長シーケンス生成、群アニメーション）は追加訓練なしで達成できるか？
RQ4SinMDMは、Mixamo、HumanML3Dなどの多様なデータセットで、Ganimatorのような単一モーションベースラインと比較してどうか？

主な発見

Coverage ↑	Global Div. ↑	Local Div. ↑	Inter Div. ↑	Intra Div. Diff. ↓	#Param. (M) ↓	#Iter. (K) ↓	Iter. Time (s) ↓	Tot. Time (h) ↓	Harmon. Mean ↑
Ganimator	94.3	1.24	1.17	0.09	0.13	21.7	60 (15 × 4)	0.36	6.0	-0.22
SinMDM (Ours)	94.3	1.42	1.00	0.13	0.03	5.26	60	0.09	1.5	0.85

SinMDMは、Mixamoベンチマークの複数の指標で前の単一モーション手法（Ganimator）を上回り、特に調和平均で、パラメータ数と反復回数を抑えつつ。
Mixamoで、SinMDMはCoverageとGlobal DiversityおよびLocal Diversityを同等または上回り、パラメータ数と総時間を大幅に削減。
Gangnam-style motionで、SinMDMはInter Diversityが高く、Local Diversityは同等かそれ以上を達成し、Coverageを維持。
SinMDMは小さな受容野と拡散ベースのフレームワークのおかげで、長モーション生成と群アニメーションを再訓練なしでサポート。
モデルは単一の中程度のGPUで訓練でき、さまざまなアプリケーションの推論時の専門化をサポートするのに十分な効率。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。