QUICK REVIEW

[论文解读] Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl‐Dickstein|arXiv (Cornell University)|Nov 26, 2020

Generative Adversarial Networks and Image Synthesis参考文献 48被引用 1,263

一句话总结

介绍了一个统一的基于分数的生成模型框架，使用前向和反向随机微分方程（SDEs）将数据转化为噪声再转回，从而实现灵活采样、通过神经 ODE 实现的精确似然，以及在连续训练和新型采样器下的CIFAR-10生成的最先进性能。

ABSTRACT

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

研究动机与目标

激发一个以扩散为灵感的统一框架，通过持续用噪声扰动数据并利用分数估计来逆转该过程来建模数据。
开发用神经网络估计时间相关分数的方法，并使用 SDEs 生成高保真样本并计算精确的似然。
引入采样改进（Predictor-Corrector、反向扩散采样器）和连续训练目标，以提升样本质量和似然性能。
在单一无条件分数模型内实现对具有类别条件生成、修复和上色等任务的可控生成。
展示在 CIFAR-10 和更高分辨率图像上的可扩展性与结果，并与扩散模型与基线进行比较。

提出的方法

Model data with a continuous diffusion process (Itô SDE) that gradually adds noise from data distribution to a tractable prior.
Derive the reverse-time SDE that uses the time-dependent score ∇x log pt(x) to transform noise back into data.
Train a time-dependent score model sθ(x,t) via continuous score matching to approximate ∇x log pt(x).
Solve the reverse SDE with general-purpose solvers; introduce Predictor-Corrector (PC) samplers combining numerical steps with score-based MCMC corrections.
Derive and use a probability flow ODE that yields the same marginals as the SDE and enables exact likelihood computation via neural ODE techniques.
Present and analyze Variance Exploding (VE), Variance Preserving (VP), and sub-VP SDE variants, including closed-form perturbation kernels and sampling updates.
Demonstrate controllable generation by conditioning on auxiliary information (e.g., class labels) through forward-model gradients, enabling imputation and colorization.]
research_questions:[
How can score-based generative modeling be unified under the framework of stochastic differential equations?
Can the reverse-time SDE be effectively estimated from time-dependent scores to generate high-fidelity samples?
What samplers (general SDE solvers, predictor-corrector, probability flow ODE) yield best trade-offs between sample quality, speed, and likelihood computation?
Can continuous training objectives and architectural improvements achieve state-of-the-art image generation metrics and exact likelihoods?
To what extent can unconditional score-based models support controllable generation tasks like class-conditional generation, inpainting, and colorization?

实验结果

研究问题

RQ1How can score-based generative modeling be unified under the framework of stochastic differential equations?
RQ2Can the reverse-time SDE be effectively estimated from time-dependent scores to generate high-fidelity samples?
RQ3What samplers (general SDE solvers, predictor-corrector, probability flow ODE) yield best trade-offs between sample quality, speed, and likelihood computation?
RQ4Can continuous training objectives and architectural improvements achieve state-of-the-art image generation metrics and exact likelihoods?
RQ5To what extent can unconditional score-based models support controllable generation tasks like class-conditional generation, inpainting, and colorization?

主要发现

一个统一的基于 SDE 的框架可以将数据映射到噪声先验，并使用估计的分数来反向扩散以生成数据样本。
通过连续分数匹配训练的时间相关分数模型可以近似 ∇x log pt(x) 对所有 t，从而实现反向 SDE 采样和通过概率流 ODE 的精确似然。
Predictor-Corrector 采样器和反向扩散采样器在 VE、VP/sub-VP SDEs 上均优于祖先采样，提升样本质量。
概率流 ODE 使快速自适应采样和精确似然计算成为可能，显示在 CIFAR-10 上对数似然（bits/dim）的提升和有竞争力的 FID/IS。
架构和训练改进（NCSN++、DDPM++、连续目标）在 CIFAR-10 上实现创纪录的图像生成指标（Inception 分数 9.89，FID 2.20），并实现了使用分数基模型的 1024×1024 CelebA-HQ 生成。
一种新的似然记录方法（DDPM++ 与 sub-VP 连续）在均匀去量化的 CIFAR-10 上达到 2.99 bits/dim，为迄今报道的最好结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。