QUICK REVIEW

[论文解读] Flow caching for autoregressive video generation

Yuexiao Ma, Xuzhe Zheng|arXiv (Cornell University)|Feb 11, 2026

Video Coding and Compression Technologies被引用 0

一句话总结

FlowCache 引入分块缓存与KV缓存压缩，在自回归视频生成中实现显著加速，且在 MAGI-1 与 SkyReels-V2 上的质量损失极小。

ABSTRACT

Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential generation process is notoriously slow. While caching strategies have proven effective for accelerating traditional video diffusion models, existing methods assume uniform denoising across all frames-an assumption that breaks down in autoregressive models where different video chunks exhibit varying similarity patterns at identical timesteps. In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. Our key insight is that each video chunk should maintain independent caching policies, allowing fine-grained control over which chunks require recomputation at each timestep. We introduce a chunkwise caching strategy that dynamically adapts to the unique denoising characteristics of each chunk, complemented by a joint importance-redundancy optimized KV cache compression mechanism that maintains fixed memory bounds while preserving generation quality. Our method achieves remarkable speedups of 2.38 times on MAGI-1 and 6.7 times on SkyReels-V2, with negligible quality degradation (VBench: 0.87 increase and 0.79 decrease respectively). These results demonstrate that FlowCache successfully unlocks the potential of autoregressive models for real-time, ultra-long video generation-establishing a new benchmark for efficient video synthesis at scale. The code is available at https://github.com/mikeallen39/FlowCache.

研究动机与目标

通过解决视频分块之间的异质性去噪问题，推动更快的自回归视频生成。
提出分块级缓存策略，以独立管理每个视频分块的重新计算。
引入联合重要性–冗余的KV缓存压缩，在不牺牲质量的前提下适配内存预算。
对自回归视频生成中的缓存动力学进行理论与经验分析。
在代表性模型上展示最先进的加速，同时保持视频质量。

提出的方法

定义每个视频分块在相邻时间步之间的相对L1距离，用于衡量重用潜力。
理论上证明随着去噪进展，相对L1距离单调增大（定理1）。
提出FlowCache，基于分块的去噪状态为每个视频分块分配独立的缓存策略。
实现KV缓存压缩，联合优化重要性与冗余以选择多样且相关的先前KV条目（式9–12）。
在MAGI-1与SkyReels-V2上评估FlowCache，附带分块重用与KV压缩的消融实验以展示其益处。

实验结果

研究问题

RQ1独立的按分块缓存策略是否能在不损害质量的前提下提升自回归视频生成的加速？
RQ2应如何压缩KV缓存以在长视频中平衡内存使用和时间连贯性？
RQ3分块层面的去噪轨迹异质性对缓存策略有何影响？
RQ4FlowCache 的理论洞见是否在不同的自回归视频模型中转化为实际的加速？

主要发现

模型	方法	PFLOPs ↓	加速↑	潜伏期Latency(s) ↓	VBench ↑	LPIPS ↓	SSIM ↑	PSNR ↑
MAGI-1	Vanilla	306	1×	2873	77.06%	-	-	-
MAGI-1	TeaCache-slow	294	1.12×	2579	77.50%	0.8160	0.1138	13.26
MAGI-1	TeaCache-fast	225	1.44×	1998	70.11%	0.8160	0.1138	8.94
MAGI-1	FlowCache-slow	161	1.86×	1546	78.96%	0.3160	0.6497	22.34
MAGI-1	FlowCache-fast	140	2.38×	1209	77.93%	0.4311	0.5140	19.27
SkyReels-V2	Vanilla	113	1×	1540	83.84%	-	-	-
SkyReels-V2	TeaCache-slow	58	1.89×	814	82.67%	0.1472	0.7501	21.96
SkyReels-V2	TeaCache-fast	49	2.2×	686	80.06%	0.3063	0.6121	18.39
SkyReels-V2	FlowCache-slow	36	5.88×	262	83.12%	0.1225	0.789	23.74
SkyReels-V2	FlowCache-fast	28	6.7×	230	83.05%	0.1467	0.7635	22.95

FlowCache 在MAGI-1上实现了2.38×加速，VBench 相比基础模型提升0.87分。
FlowCache 在SkyReels-V2上实现了6.7×加速，VBench 分数下降0.79。
分块级重用优于 TeaCache 风格的统一缓存，在保持质量方面表现更好。
KV缓存压缩在极小的质量损失下减少了内存/计算开销。
在不同模型上，FlowCache 均实现显著的效率提升，感知降级极小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。