[論文レビュー] Flow Matching for Generative Modeling
各サンプル条件付き確率経路(OTを含む)を用いたContinuous Normalizing Flows(CNFs)の simulation-free 訓練フレームワーク Flow Matching(FM)を提案。スケーラブルかつ効率的な生成と、拡散法と比べた対数尤度とサンプル品質の向上を実現。
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
研究の動機と目的
- Develop a scalable, simulation-free training objective for Continuous Normalizing Flows (CNFs).
- Leverage per-sample conditional probability paths to construct tractable targets for CNF training.
- Explore a general family of probability paths (including diffusion and OT) within Flow Matching.
- Demonstrate that Flow Matching can outperform diffusion-based methods on image datasets in likelihood and sample quality.
- Show that OT-based paths yield faster training, sampling, and better generalization.
提案手法
- Define Flow Matching (FM) objective to regress a neural vector field v_t to a target path-generating field u_t.
- Construct p_t and u_t from conditional probability paths p_t(x|x1) and conditional vector fields u_t(x|x1) and aggregate to obtain marginal p_t and u_t.
- Use Conditional Flow Matching (CFM), which has equivalent gradients to FM, enabling per-sample training without explicit marginal targets.
- Adopt a general Gaussian conditional path p_t(x|x1) with mean mu_t(x1) and std sigma_t(x1), and derive the conditional vector field u_t(x|x1) via the flow map psi_t.
- Specialize to diffusion-based paths (VE and VP) and Optimal Transport (OT) displacement interpolants, highlighting OT’s linear, straight-line trajectories and simpler regression targets.
- Train CNFs on ImageNet with Flow Matching (FM) using OT paths and compare to diffusion-based baselines on likelihood (NLL/BPD) and FID, plus sampling efficiency.
実験結果
リサーチクエスチョン
- RQ1Can a simulation-free Flow Matching objective train CNFs at scale without solving ODEs for each step?
- RQ2How do conditional probability paths (diffusion vs OT) compare in terms of training stability, sampling efficiency, and model quality?
- RQ3Does the OT-based conditional path provide faster training and better generalization than diffusion paths when used in Flow Matching?
- RQ4How does Flow Matching perform on large-scale datasets (ImageNet) in terms of likelihood and sample quality relative to diffusion-based methods?
- RQ5Can Flow Matching enable reliable conditional generation and fast sampling with off-the-shelf ODE solvers?
主な発見
| Model | CIFAR-10 NLL (BPD) | CIFAR-10 FID | CIFAR-10 NFE | ImageNet 32x32 NLL (BPD) | ImageNet 32x32 FID | ImageNet 32x32 NFE | ImageNet 64x64 NLL (BPD) | ImageNet 64x64 FID | ImageNet 64x64 NFE |
|---|---|---|---|---|---|---|---|---|---|
| DDPM | 3.12 | 7.48 | 274 | 3.54 | 6.99 | 262 | 3.32 | 17.36 | 264 |
| Score Matching | 3.16 | 19.94 | 242 | 3.56 | 5.68 | 178 | 3.40 | 19.74 | 441 |
| ScoreFlow | 3.09 | 20.78 | 428 | 3.55 | 14.14 | 195 | 3.36 | 24.95 | 601 |
| FM w/Diffusion | 3.10 | 8.06 | 183 | 3.54 | 6.37 | 193 | 3.33 | 16.88 | 187 |
| FM w/ OT | 2.99 | 6.35 | 142 | 3.53 | 5.02 | 122 | 3.31 | 14.45 | 138 |
| FM w/ OT (ImageNet 128) | 2.90 | 20.9 | - | - | - | - | - | - | - |
- Flow Matching with OT paths yields better NLL (BPD) and FID and often lower NFE than diffusion baselines across CIFAR-10 and ImageNet variants.
- FM-OT consistently achieves the best results among baselines on CIFAR-10 and ImageNet 32x32/64x64 in Table 1 (NLL, FID, NFE).
- On ImageNet-128x128, FM w/ OT achieves a competitive NLL (2.90) and FID (20.9) compared to a range of GAN-based methods listed, with Flow Matching offering strong likelihood and sample quality.
- Flow Matching with OT enables faster sampling: for the same numerical accuracy, OT paths require fewer function evaluations (NFE) than diffusion paths, and provide better cost-quality trade-offs.
- CFM provides equivalent gradients to FM, enabling tractable per-sample training without explicit marginal vector fields.
- OT paths produce straight-line trajectories in latent space, leading to simpler regression targets and more efficient training/sampling compared to diffusion paths
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。