QUICK REVIEW

[论文解读] FastVMT: Eliminating Redundancy in Video Motion Transfer

Yue Ma, Zhikai Wang|arXiv (Cornell University)|Feb 5, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

FastVMT 通过滑动窗口注意力策略和步进跳跃梯度更新，在训练无关的视频运动传输中去除运动与梯度冗余，以最小的质量损失提速生成。

ABSTRACT

Video motion transfer aims to synthesize videos by generating visual content according to a text prompt while transferring the motion pattern observed in a reference video. Recent methods predominantly use the Diffusion Transformer (DiT) architecture. To achieve satisfactory runtime, several methods attempt to accelerate the computations in the DiT, but fail to address structural sources of inefficiency. In this work, we identify and remove two types of computational redundancy in earlier work: motion redundancy arises because the generic DiT architecture does not reflect the fact that frame-to-frame motion is small and smooth; gradient redundancy occurs if one ignores that gradients change slowly along the diffusion trajectory. To mitigate motion redundancy, we mask the corresponding attention layers to a local neighborhood such that interaction weights are not computed unnecessarily distant image regions. To exploit gradient redundancy, we design an optimization scheme that reuses gradients from previous diffusion steps and skips unwarranted gradient computations. On average, FastVMT achieves a 3.43x speedup without degrading the visual fidelity or the temporal consistency of the generated videos.

研究动机与目标

通过解决计算浪费，推动高效的训练无关视频运动传输。
在基于 DiT 的流程中识别出两类冗余来源：运动冗余和梯度冗余。
提出在保持保真度与时间一致性的前提下减少冗余的技术。
在不对视频质量产生实质性下降的情况下展示显著的加速。

提出的方法

引入滑动窗口的运动提取，替代 inversion 期间的全局 token 相似度。
计算相应窗口损失，以在窗口之间强制稳定的运动对应关系。
实现步进跳跃梯度优化，在扩散步之间重用梯度。
在局部窗口约束下，利用注意力-运动流（AMF）框架实现高效的运动传输。
使用带窗口的损失，促进相邻帧之间的时间一致性。

实验结果

研究问题

RQ1滑动窗口注意力是否能够在不牺牲运动保真度的前提下降低运动提取的计算量？
RQ2相应窗口损失是否改善扩散 Transformer 中运动传输的时间稳定性？
RQ3步进跳跃梯度更新是否在保留视频质量的前提下实质性降低计算量？
RQ4FastVMT 在单对象/多对象运动、自运动与复杂关节运动等情形下的表现如何？

主要发现

Method	Text Sim.	Motion Fid.	Temp. Cons.	Time (s)	Sub. Cons.	Back. Cons.	Aes. Qual.	Motion Smooth.
Ours	0.2422	0.7471	0.9865	184	0.9809	0.9684	0.5778	0.9891

在平均层面上比现有的训练无关运动传输管线快3.43倍且无明显质量损失。
在某些场景下实现高达14.91倍的延迟降低，同时保持高视觉保真和时间一致性。
在自动评估中，在运动保真、时间一致性和文本对齐等指标上超越基线。
在定性与定量评估中，跨帧保留主体身份和背景的一致性。
通过用户研究确认 FastVMT 在运动保留和整体质量方面更受欢迎。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。