QUICK REVIEW

[论文解读] Stable Velocity: A Variance Perspective on Flow Matching

Donglin Yang, Yongxing Zhang|arXiv (Cornell University)|Feb 5, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

本论文分析流动匹配中条件速度目标的方差，提出稳定速度与方差降低训练（StableVM）与自适应监督（VA-REPA），并引入稳定速度采样（StableVS）以实现更快、稳定的推断。

ABSTRACT

While flow matching is elegant, its reliance on single-sample conditional velocities leads to high-variance training targets that destabilize optimization and slow convergence. By explicitly characterizing this variance, we identify 1) a high-variance regime near the prior, where optimization is challenging, and 2) a low-variance regime near the data distribution, where conditional and marginal velocities nearly coincide. Leveraging this insight, we propose Stable Velocity, a unified framework that improves both training and sampling. For training, we introduce Stable Velocity Matching (StableVM), an unbiased variance-reduction objective, along with Variance-Aware Representation Alignment (VA-REPA), which adaptively strengthen auxiliary supervision in the low-variance regime. For inference, we show that dynamics in the low-variance regime admit closed-form simplifications, enabling Stable Velocity Sampling (StableVS), a finetuning-free acceleration. Extensive experiments on ImageNet $256 imes256$ and large pretrained text-to-image and text-to-video models, including SD3.5, Flux, Qwen-Image, and Wan2.2, demonstrate consistent improvements in training efficiency and more than $2 imes$ faster sampling within the low-variance regime without degrading sample quality. Our code is available at https://github.com/linYDTHU/StableVelocity.

研究动机与目标

表征条件流动匹配目标的方差结构，并识别低方差与高方差区间。
开发无偏、方差降低的训练目标（StableVM），在保持现有流动匹配损失全局极小点不变的前提下降低方差。
引入 Variance-Aware Representation Alignment (VA-REPA)，在方差区间内自适应地加强监督强度。
提供一种采样加速方法（StableVS），利用低方差区间实现更快、无需调参的推断。
在 ImageNet 潜在空间以及预训练文本到图像和文本到视频模型上证明改进。

提出的方法

定义并分析流动匹配中条件速度的方差，揭示两阶段结构（数据近似处低方差、先验处高方差）。
提出 StableVM：对参考样本进行多样本、自归一化聚合以降低训练方差，同时保持与 CFM 相同的极小点。
引入 VA-REPA：方差感知、自适应表示对齐，仅在低方差区间以归一化权重加强辅助监督。
在标签稀疏时，扩展 StableVM 为类别条件记忆库，以保持无偏性。
开发 StableVS：在低方差区间实现封闭形式或类似 DDIM 的采样简化，以实现更快、无需微调的采样。

Figure 1 : Variance curves of ${\mathcal{V}}_{\text{CFM}}(t)$ with 15%–85% quantile bands. Evaluated on GMMs of varying dimensionality, CIFAR-10 images, and $256\times 256$ ImageNet latents obtained by the Stable Diffusion VAE. The $y$ -axis reports ${\mathcal{V}}_{\text{CFM}}(t)$ normalized by the

实验结果

研究问题

RQ1在扩散时间步中，流动匹配的条件速度目标的方差行为是什么？
RQ2是否可以在不改变流动匹配目标全局极小点的前提下降低训练方差？
RQ3如何自适应安排辅助监督以与方差区间对齐，从而加速训练？
RQ4能否通过利用低方差区间在不牺牲样本质量的前提下加速采样？
RQ5所提方法能否在不同模型规模和不同预训练扩散骨干网络上迁移？

主要发现

CFM 目标呈现两阶段方差：在数据分布附近方差较低，在先验附近方差较高。
StableVM 提供一个无偏、方差降低的训练目标，保持 CFM 极小点不变，并将目标方差降低约 O(1/n)。
VA-REPA 在低方差区间自适应加强表示对齐，提升训练效率与 FID/IS 指标。
StableVS 在多个模型（SD3.5、Flux、Qwen-Image、Wan2.2）下在低方差区间实现超过 2× 的推断加速且感知质量无显著下降。
StableVM 与 VA-REPA 在不同模型规模与训练变体中持续优于 REPA 基线；StableVS 在各种任务中以显著更少的步数达到或超过 30 步基线的性能。

Figure 2 : Illustration of CFM variance ${\mathcal{V}}_{\text{CFM}}(t)$ . (a) The low-variance regime ( $t\leq\xi$ ), where the posterior $p_{t}({\bm{x}}_{0}\mid{\bm{x}}_{t})$ is sharply concentrated and the conditional velocity ${\bm{v}}_{t}({\bm{x}}_{t}\mid{\bm{x}}_{0})$ nearly coincides with the

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。