Skip to main content
QUICK REVIEW

[论文解读] Pathwise Test-Time Correction for Autoregressive Long Video Generation

Xunzhi Xiang, Zixuan Duan|arXiv (Cornell University)|Feb 5, 2026
Generative Adversarial Networks and Image Synthesis被引用 0
一句话总结

论文提出一种训练无关的测试时刻纠错(TTC),在蒸馏自回归扩散模型的随机采样过程中注入路径级纠错,以缓解长时间步的误差累积,将稳定长时视频生成扩展至约30秒且无需再训练。

ABSTRACT

Distilled autoregressive diffusion models facilitate real-time short video synthesis but suffer from severe error accumulation during long-sequence generation. While existing Test-Time Optimization (TTO) methods prove effective for images or short clips, we identify that they fail to mitigate drift in extended sequences due to unstable reward landscapes and the hypersensitivity of distilled parameters. To overcome these limitations, we introduce Test-Time Correction (TTC), a training-free alternative. Specifically, TTC utilizes the initial frame as a stable reference anchor to calibrate intermediate stochastic states along the sampling trajectory. Extensive experiments demonstrate that our method seamlessly integrates with various distilled models, extending generation lengths with negligible overhead while matching the quality of resource-intensive training-based methods on 30-second benchmarks.

研究动机与目标

  • 在蒸馏扩散模型下,激发并解决长时自回归视频生成中的误差累积问题。
  • 提出一个训练无关的测试时纠错框架,在随机采样路径上进行干预以稳定生成。
  • 在扩长序列的同时避免再训练,保持采样分布与时间一致性。

提出的方法

  • 用少步蒸馏扩散实现自回归长视频生成。
  • 在选定步使用参考条件去噪,以初始帧作为锚点。
  • 提出路径级纠错:在选定步应用纠错、重新加噪至当前水平,并使用原始上下文继续去噪。
  • 通过单点纠错思想结合路径级重新加噪,以避免陷落、闪烁。
  • 在扩散采样循环中将TTC步骤形式化,并给出算法描述(算法1)。
  • 证明与多种蒸馏模型兼容,并与基于训练的基线和测试时缩放基线进行对比。
Figure 2 : Comparison of sampling strategies. The Original Path suffers from error accumulation, while the Sink-based Path collapses into a Sink Point (dynamic collapse). In contrast, our TTC strategy avoids these failures by employing reference-conditioned denoising and explicit Re-noising , effect
Figure 2 : Comparison of sampling strategies. The Original Path suffers from error accumulation, while the Sink-based Path collapses into a Sink Point (dynamic collapse). In contrast, our TTC strategy avoids these failures by employing reference-conditioned denoising and explicit Re-noising , effect

实验结果

研究问题

  • RQ1测试时干预是否可在不重新训练的情况下稳定长时自回归视频生成?
  • RQ2路径级、基于参考的纠错是否优于单点纠错或基于陷阱的条件在保持时间一致性方面?
  • RQ3纠错放置、纠错步数与推理开销之间的权衡是什么?
  • RQ4与基于训练的方法和测试时缩放相比,在质量与效率方面TTC有何差异?
  • RQ5TTC在不同骨干模型和基于提示的场景下是否具有鲁棒性?

主要发现

  • TTC将稳定生成长度从几秒扩展到超过30秒,开销几乎可忽略。
  • 路径级纠错嵌入到随机采样路径中,抑制长期误差累积与时间漂移。
  • 单点纠错可能引发伪影;路径级重新加噪可获得更平滑、更加一致的轨迹。
  • 在噪声水平500和250处的纠错步骤对不同配置均表现鲁棒。
  • TTC在视觉质量上与基于训练的方法相当,同时保持训练无关且快速。
Figure 3 : Variants of autoregressive video generation. Discrete AR uses single-step deterministic prediction, multi-step diffusion follows a deterministic ODE trajectory, while few-step distilled diffusion performs stochastic sampling with intermediate noise injection.
Figure 3 : Variants of autoregressive video generation. Discrete AR uses single-step deterministic prediction, multi-step diffusion follows a deterministic ODE trajectory, while few-step distilled diffusion performs stochastic sampling with intermediate noise injection.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。