QUICK REVIEW

[논문 리뷰] Stable Velocity: A Variance Perspective on Flow Matching

Donglin Yang, Yongxing Zhang|arXiv (Cornell University)|2026. 02. 05.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

이 논문은 흐름 매칭에서 조건부 속도 타깃의 분산을 분석하고, 분산 감소 학습이 적용된 Stable Velocity(StableVM) 및 adaptive supervision(VA-REPA)을 제안하며, 빠르고 안정적인 추론을 위한 Stable Velocity Sampling(StableVS)을 도입합니다.

ABSTRACT

While flow matching is elegant, its reliance on single-sample conditional velocities leads to high-variance training targets that destabilize optimization and slow convergence. By explicitly characterizing this variance, we identify 1) a high-variance regime near the prior, where optimization is challenging, and 2) a low-variance regime near the data distribution, where conditional and marginal velocities nearly coincide. Leveraging this insight, we propose Stable Velocity, a unified framework that improves both training and sampling. For training, we introduce Stable Velocity Matching (StableVM), an unbiased variance-reduction objective, along with Variance-Aware Representation Alignment (VA-REPA), which adaptively strengthen auxiliary supervision in the low-variance regime. For inference, we show that dynamics in the low-variance regime admit closed-form simplifications, enabling Stable Velocity Sampling (StableVS), a finetuning-free acceleration. Extensive experiments on ImageNet $256 imes256$ and large pretrained text-to-image and text-to-video models, including SD3.5, Flux, Qwen-Image, and Wan2.2, demonstrate consistent improvements in training efficiency and more than $2 imes$ faster sampling within the low-variance regime without degrading sample quality. Our code is available at https://github.com/linYDTHU/StableVelocity.

연구 동기 및 목표

조건부 흐름 매칭 대상의 분산 구조를 특성화하고 저분산 및 고분산 영역을 식별한다.
기존 흐름 매칭 손실의 전역 최소점을 보존하면서 편향 없이 분산을 줄이는 학습 목표(Stabl VM)를 개발한다.
분산 인식 표현 정렬(VA-REPA)을 도입하여 분산 영역 전반에 걸쳐 감독 강도를 적응적으로 조정한다.
저분산 영역을 활용하여 더 빠르고 튜닝이 필요 없는 추론을 가능하게 하는 샘플링 가속 방법(StableVS)을 제공한다.
ImageNet 잠재 공간 및 사전 학습된 텍스트-투-이미지(text-to-image)와 텍스트-투-비디오(text-to-video) 모델 전반에서 개선을 시연한다.

제안 방법

흐름 매칭에서 조건적 속도 분산을 정의하고 분석하여 두 가지 규칙 구조를 밝힌다(데이터 근처 저분산, 사전 근처 고분산).
StableVM 제안: CFM의 같은 최소점은 유지하면서 학습 분산을 줄이기 위해 참조 샘플들에 대한 다중 샘플, 자기정규화 집계를 사용한다.
VA-REPA 도입: 분산 인식적 적응 표현 정렬로, 정규화된 가중치를 사용하여 저분산 영역에서만 보조 감독을 강화한다.
레이블이 희박할 때 편향을 유지하기 위해 클래스 조건 메모리 뱅크를 사용하여 StableVM을 확장한다.
저분산 영역에서 더 빠르고 미세조정 없이 샘플링을 가능하게 하는 폐쇄 형식 또는 DDIM 유사 샘플링 간소화를 포함하는 StableVS를 개발한다.

Figure 1 : Variance curves of ${\mathcal{V}}_{\text{CFM}}(t)$ with 15%–85% quantile bands. Evaluated on GMMs of varying dimensionality, CIFAR-10 images, and $256\times 256$ ImageNet latents obtained by the Stable Diffusion VAE. The $y$ -axis reports ${\mathcal{V}}_{\text{CFM}}(t)$ normalized by the

실험 결과

연구 질문

RQ1확산 단계 전반에 걸친 흐름 매칭에서 조건부 속도 타깃의 분산 거동은 어떠한가?
RQ2플로우 매칭 목적의 전역 최소점을 바꾸지 않으면서 학습 분산을 줄일 수 있는가?
RQ3학습 가속화를 위해 보조 감독을 분산 영역에 맞게 적응적으로 조정하는 방법은?
RQ4샘플 품질을 손상시키지 않으면서 저분산 영역을 활용해 샘플링을 가속화할 수 있는가?
RQ5제안된 방법들이 모델 규모 및 서로 다른 사전 학습 확산 백본 간에 전이 가능한가?

주요 결과

CFM 타깃은 두 가지 규칙 분산을 나타낸다: 데이터 분포 근처의 저분산과 사전 근처의 고분산.
StableVM은 CFM 최소점을 보존하고 타깃 분산을 O(1/n)만큼 감소시키는 편향 없는 분산 감소 학습 타깃을 제공한다.
VA-REPA는 저분산 영역에서 표현 정렬을 적응적으로 강화하여 학습 효율성과 FID/IS 지표를 개선한다.
StableVS는 여러 모델(SD3.5, Flux, Qwen-Image, Wan2.2)에서 저분산 영역에서 2배 이상 추론 가속을 달성하되 품질 저하가 느껴지지 않는다.
StableVM과 VA-REPA는 모델 규모 및 학습 변형에 걸쳐 REPA 기준선을 꾸준히 능가하며; StableVS는 다양한 작업에서 30단계 기준선을 비슷하거나 초과하는 성능을 훨씬 적은 스텝으로 달성한다.

Figure 2 : Illustration of CFM variance ${\mathcal{V}}_{\text{CFM}}(t)$ . (a) The low-variance regime ( $t\leq\xi$ ), where the posterior $p_{t}({\bm{x}}_{0}\mid{\bm{x}}_{t})$ is sharply concentrated and the conditional velocity ${\bm{v}}_{t}({\bm{x}}_{t}\mid{\bm{x}}_{0})$ nearly coincides with the

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.