QUICK REVIEW

[論文レビュー] Pathwise Test-Time Correction for Autoregressive Long Video Generation

Xunzhi Xiang, Zixuan Duan|arXiv (Cornell University)|Feb 5, 2026

Generative Adversarial Networks and Image Synthesis被引用数 0

ひとこと要約

論文は、蒸留自己回帰拡散モデルの確率的サンプリング中に経路ごとの補正を挿入する訓練不要の Test-Time Correction (TTC) を提案し、長期的な誤差蓄積を緩和して再訓練なしで安定した長時間の動画生成を約30秒まで拡張します。

ABSTRACT

Distilled autoregressive diffusion models facilitate real-time short video synthesis but suffer from severe error accumulation during long-sequence generation. While existing Test-Time Optimization (TTO) methods prove effective for images or short clips, we identify that they fail to mitigate drift in extended sequences due to unstable reward landscapes and the hypersensitivity of distilled parameters. To overcome these limitations, we introduce Test-Time Correction (TTC), a training-free alternative. Specifically, TTC utilizes the initial frame as a stable reference anchor to calibrate intermediate stochastic states along the sampling trajectory. Extensive experiments demonstrate that our method seamlessly integrates with various distilled models, extending generation lengths with negligible overhead while matching the quality of resource-intensive training-based methods on 30-second benchmarks.

研究の動機と目的

蒸留拡散モデルを用いた長期的な自己回帰動画生成における誤差蓄積の動機付けと対処。
生成を安定化させるため、確率抽出パスに介入する訓練不要の Test-Time Correction フレームワークを提案。
拡張されたシーケンス全体でサンプリング分布と時間的一貫性を保ちながらモデル再訓練を回避します。

提案手法

少ステップの蒸留拡散で自己回帰的長時間動画生成をモデル化。
初帧をアンカーとして選択されたステップで参照条件付きノイズ除去を導入。
経路ごとの補正を提案：選択したステップで補正を適用し、現在のレベルへ再ノイズ化し、元の文脈で再びノイズ除去を再開。
沈み込み崩壊とフリッキングを回避するために、単一点補正アイデアを経路ごとの再ノイズ化で補強。
拡散サンプリングループ内に TTC ステップを形式化し、アルゴリズム的記述（Algorithm 1）を提供。
複数の蒸留モデルとの適合性を実証し、訓練ベースおよびテスト時スケーリングのベースラインと比較。

Figure 2 : Comparison of sampling strategies. The Original Path suffers from error accumulation, while the Sink-based Path collapses into a Sink Point (dynamic collapse). In contrast, our TTC strategy avoids these failures by employing reference-conditioned denoising and explicit Re-noising , effect

実験結果

リサーチクエスチョン

RQ1訓練を再実行することなく、テスト時の介入で長期的な自己回帰動画生成を安定化できますか。
RQ2経路ごと・参照ベースの修正は、単一点補正や沈み込み条件付けに比べて時間的一貫性の維持に優れますか。
RQ3補正配置、補正ステップ数、推論オーバーヘッドのトレードオフはどのようなものですか。
RQ4TTC は品質と効率の点で訓練ベースの手法およびテスト時スケーリングとどのように比較されますか。
RQ5TTC は異なるバックボーンモデルやプロンプト条件付きシナリオに対して頑健ですか。

主な発見

TTC は安定した生成長を数秒から30秒超へ拡張し、オーバーヘッドはほぼゼロです。
経路ごとの補正を確率的サンプリング経路に組み込むことで、長期的な誤差蓄積と時間的ドリフトを抑制します。
単一点補正はアーティファクトを生み出す可能性があるのに対し、経路ごとの再ノイズ化はより滑らかで一貫した軌跡を生み出します。
ノイズレベル 500 および 250 での補正ステップは、構成に関係なく堅牢な性能を提供します。
TTC は訓練ベースの手法と同等の視覚品質を達成しつつ、訓練不要で迅速です。

Figure 3 : Variants of autoregressive video generation. Discrete AR uses single-step deterministic prediction, multi-step diffusion follows a deterministic ODE trajectory, while few-step distilled diffusion performs stochastic sampling with intermediate noise injection.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。