Skip to main content
QUICK REVIEW

[论文解读] Flow caching for autoregressive video generation

Yuexiao Ma, Xuzhe Zheng|arXiv (Cornell University)|Feb 11, 2026
Video Coding and Compression Technologies被引用 0
一句话总结

FlowCache 引入分块缓存与KV缓存压缩,在自回归视频生成中实现显著加速,且在 MAGI-1 与 SkyReels-V2 上的质量损失极小。

ABSTRACT

Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential generation process is notoriously slow. While caching strategies have proven effective for accelerating traditional video diffusion models, existing methods assume uniform denoising across all frames-an assumption that breaks down in autoregressive models where different video chunks exhibit varying similarity patterns at identical timesteps. In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. Our key insight is that each video chunk should maintain independent caching policies, allowing fine-grained control over which chunks require recomputation at each timestep. We introduce a chunkwise caching strategy that dynamically adapts to the unique denoising characteristics of each chunk, complemented by a joint importance-redundancy optimized KV cache compression mechanism that maintains fixed memory bounds while preserving generation quality. Our method achieves remarkable speedups of 2.38 times on MAGI-1 and 6.7 times on SkyReels-V2, with negligible quality degradation (VBench: 0.87 increase and 0.79 decrease respectively). These results demonstrate that FlowCache successfully unlocks the potential of autoregressive models for real-time, ultra-long video generation-establishing a new benchmark for efficient video synthesis at scale. The code is available at https://github.com/mikeallen39/FlowCache.

研究动机与目标

  • 通过解决视频分块之间的异质性去噪问题,推动更快的自回归视频生成。
  • 提出分块级缓存策略,以独立管理每个视频分块的重新计算。
  • 引入联合重要性–冗余的KV缓存压缩,在不牺牲质量的前提下适配内存预算。
  • 对自回归视频生成中的缓存动力学进行理论与经验分析。
  • 在代表性模型上展示最先进的加速,同时保持视频质量。

提出的方法

  • 定义每个视频分块在相邻时间步之间的相对L1距离,用于衡量重用潜力。
  • 理论上证明随着去噪进展,相对L1距离单调增大(定理1)。
  • 提出FlowCache,基于分块的去噪状态为每个视频分块分配独立的缓存策略。
  • 实现KV缓存压缩,联合优化重要性与冗余以选择多样且相关的先前KV条目(式9–12)。
  • 在MAGI-1与SkyReels-V2上评估FlowCache,附带分块重用与KV压缩的消融实验以展示其益处。

实验结果

研究问题

  • RQ1独立的按分块缓存策略是否能在不损害质量的前提下提升自回归视频生成的加速?
  • RQ2应如何压缩KV缓存以在长视频中平衡内存使用和时间连贯性?
  • RQ3分块层面的去噪轨迹异质性对缓存策略有何影响?
  • RQ4FlowCache 的理论洞见是否在不同的自回归视频模型中转化为实际的加速?

主要发现

模型方法PFLOPs ↓加速↑潜伏期Latency(s) ↓VBench ↑LPIPS ↓SSIM ↑PSNR ↑
MAGI-1Vanilla306287377.06%---
MAGI-1TeaCache-slow2941.12×257977.50%0.81600.113813.26
MAGI-1TeaCache-fast2251.44×199870.11%0.81600.11388.94
MAGI-1FlowCache-slow1611.86×154678.96%0.31600.649722.34
MAGI-1FlowCache-fast1402.38×120977.93%0.43110.514019.27
SkyReels-V2Vanilla113154083.84%---
SkyReels-V2TeaCache-slow581.89×81482.67%0.14720.750121.96
SkyReels-V2TeaCache-fast492.2×68680.06%0.30630.612118.39
SkyReels-V2FlowCache-slow365.88×26283.12%0.12250.78923.74
SkyReels-V2FlowCache-fast286.7×23083.05%0.14670.763522.95
  • FlowCache 在MAGI-1上实现了2.38×加速,VBench 相比基础模型提升0.87分。
  • FlowCache 在SkyReels-V2上实现了6.7×加速,VBench 分数下降0.79。
  • 分块级重用优于 TeaCache 风格的统一缓存,在保持质量方面表现更好。
  • KV缓存压缩在极小的质量损失下减少了内存/计算开销。
  • 在不同模型上,FlowCache 均实现显著的效率提升,感知降级极小。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。