QUICK REVIEW

[論文レビュー] FastVMT: Eliminating Redundancy in Video Motion Transfer

Yue Ma, Zhikai Wang|arXiv (Cornell University)|Feb 5, 2026

Generative Adversarial Networks and Image Synthesis被引用数 0

ひとこと要約

FastVMTはスライディングウィンドウ型注意機構とステップスキップ勾配更新を用いて、トレーニング不要なビデオモーション転送におけるモーションと勾配の冗長性を排除し、わずかな品質低下で生成を高速化します。

ABSTRACT

Video motion transfer aims to synthesize videos by generating visual content according to a text prompt while transferring the motion pattern observed in a reference video. Recent methods predominantly use the Diffusion Transformer (DiT) architecture. To achieve satisfactory runtime, several methods attempt to accelerate the computations in the DiT, but fail to address structural sources of inefficiency. In this work, we identify and remove two types of computational redundancy in earlier work: motion redundancy arises because the generic DiT architecture does not reflect the fact that frame-to-frame motion is small and smooth; gradient redundancy occurs if one ignores that gradients change slowly along the diffusion trajectory. To mitigate motion redundancy, we mask the corresponding attention layers to a local neighborhood such that interaction weights are not computed unnecessarily distant image regions. To exploit gradient redundancy, we design an optimization scheme that reuses gradients from previous diffusion steps and skips unwarranted gradient computations. On average, FastVMT achieves a 3.43x speedup without degrading the visual fidelity or the temporal consistency of the generated videos.

研究の動機と目的

トレーニング不要のビデオモーション転送を効率的に動機づけ、計算資源の無駄を解消する。
DiTベースのパイプラインにおける冗長性の二つの源泉：モーション冗長性と勾配冗長性を特定する。
忠実度と時間的一貫性を保ちながら冗長性を低減する技術を提案する。
動画品質を大きく毀損することなく実質的な速度向上を示す。

提案手法

inversion時に全体トークン類似度の代わりにスライディングウィンドウモーション抽出を導入する。
ウィンドウ間の安定したモーション対応を強制する対応ウィンドウ損失を計算する。
拡散ステップ間で勾配を再利用するステップスキップ勾配最適化を実装する。
局所ウィンドウで制約された注意-モーションフロー(AMF)フレームワークを活用し、効率的なモーション転送を実現する。
隣接フレーム間の時間的一貫性を促進する窓付き損失を使用する。

実験結果

リサーチクエスチョン

RQ1スライディングウィンドウ型注意はモーション抽出計算を削減しつつモーション忠実度を犠牲にしないか。
RQ2対応ウィンドウ損失は拡散トランスフォーマーにおけるモーション転送の時間的安定性を改善するか。
RQ3ステップスキップ勾配更新は計算量を実質的に削減しつつ動画品質を維持できるか。
RQ4FastVMTは単一・複数オブジェクトのモーション、エゴモーション、複雑な関節運動に対してどう性能を示すか。

主な発見

Method	Text Sim.	Motion Fid.	Temp. Cons.	Time (s)	Sub. Cons.	Back. Cons.	Aes. Qual.	Motion Smooth.
Ours	0.2422	0.7471	0.9865	184	0.9809	0.9684	0.5778	0.9891

トレーニング不要なモーション転送パイプラインに対して平均3.43xの速度upを達成し、品質低下を認められない。
特定のシナリオで最大14.91xの待機時間削減を実証しつつ高い視覚忠実度と時間的一貫性を維持。
自動評価においてモーション忠実度、時間的整合性、テキスト整合性指標のすべてでベースラインを上回る。
qualitatively 及びquantitatively の評価で、フレーム間の被写体同一性と背景の一貫性を保持。
ユーザー調査を通じて、モーション保持と全体品質の点でFastVMTが優れていることを確認。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。