QUICK REVIEW

[论文解读] Generative Modeling by Estimating Gradients of the Data Distribution

Yang Song, Stefano Ermon|arXiv (Cornell University)|Jul 12, 2019

Generative Adversarial Networks and Image Synthesis参考文献 68被引用 985

一句话总结

本论文通过分数匹配学习扰动数据的分数函数，并通过退火 Langevin 动力学生成样本，在无需对抗性训练的情况下实现有竞争力的图像生成效果。

ABSTRACT

We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields of gradients of the perturbed data distribution for all noise levels. For sampling, we propose an annealed Langevin dynamics where we use gradients corresponding to gradually decreasing noise levels as the sampling process gets closer to the data manifold. Our framework allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons. Our models produce samples comparable to GANs on MNIST, CelebA and CIFAR-10 datasets, achieving a new state-of-the-art inception score of 8.87 on CIFAR-10. Additionally, we demonstrate that our models learn effective representations via image inpainting experiments.

研究动机与目标

提出一种新的生成建模方法，避免对抗训练和似然约束。
通过在多个层次对数据加入高斯噪声来处理流形和低密度采样的挑战。
学习一个噪声条件分数网络（Noise Conditional Score Network，NCSN），联合估计所有噪声水平的分数。
使用退火 Langevin 动力学从越来越精细的分布进行采样，逼近数据流形。

提出的方法

通过分数匹配估计被扰动数据分布的分数，且不需要归一化的似然。
训练一个单一的条件分数网络 s_theta(x, sigma)，近似 ∇x log q_sigma(x) 对于一组高斯噪声水平 {sigma_i}。
在多个噪声水平上结合去噪分数匹配，通过加权目标函数，使用 lambda(sigma_i) = sigma_i^2 来平衡贡献。
使用退火 Langevin 动力学进行采样，逐步降低噪声水平，从较大的 sigma 开始到较小，提升混合性和样本质量。
使用带扩张卷积的 U-Net 和条件实例归一化来处理图像数据，构建分数网络。
提供一个避免对抗训练的训练目标，可用于定量比较不同模型。

实验结果

研究问题

RQ1基于分数的生成模型是否能够在不使用对抗训练或似然目标的情况下学习数据分布？
RQ2在多个层次对数据加入高斯噪声是否能实现一致的分数估计和高效采样？
RQ3退火 Langevin 采样过程能否有效地从多噪声分数估计中生成高质量样本？
RQ4噪声条件分数网络（NCSN）是否在标准数据集上产生具有竞争力的图像样本和有用的表征（如修复/缺损填充）？

主要发现

模型	Inception 分数	FID
Unconditional PixelCNN	4.60	65.93
PixelIQN	5.29	49.46
EBM (Uncond)	6.02	40.58
WGAN-GP	7.86±.07	36.40
MoLM	7.90±.10	18.90
SNGAN	8.22±.05	21.70
ProgressiveGAN	8.80±.05	-
NCSN (Ours)	8.87±.12	25.32
EBM (CIFAR-10)	8.30	37.90
SNGAN (CIFAR-10)	8.60±.08	25.50
BigGAN	9.22	14.73

在 CIFAR-10 的无条件 Inception Score 达到 8.87（当时无条件模型中的最先进水平）。
达到 CIFAR-10 的 FID 为 25.32，与顶尖模型（如 SNGAN）相比具有竞争力。
在 MNIST、CelebA 和 CIFAR-10 上展示了高质量样本，与基于似然的模型和GANs 相媲美。
展示了成功的图像修复，表明学到了有意义的表征。
使用带多噪声分数网络的退火 Langevin 动力学在模态混合方面优于标准 Langevin 采样。
为模型比较提供了一个无对抗训练的原理性目标。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。