QUICK REVIEW

[论文解读] Variational Diffusion Models

Diederik P. Kingma, Tim Salimans|arXiv (Cornell University)|Jul 1, 2021

Generative Adversarial Networks and Image Synthesis参考文献 41被引用 282

一句话总结

本文提出变分扩散模型（VDMs），学习可学习的扩散时间表并使用傅里叶特征，在 CIFAR-10 和 ImageNet 密度估计基准上获得最先进的对数似然，同时提供关于变分下界（VLB）和扩散过程等价性的理论见解。

ABSTRACT

Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to use the model as part of a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum. Code is available at https://github.com/google-research/vdm .

研究动机与目标

以扩散模型推动基于似然的图像生成，并在密度估计基准上缩小与自回归模型的差距。
引入一个灵活的基于扩散的族群（VDMs），具有可学习的扩散时间表和傅里叶特征以提高似然。
提供扩散模型的变分下界（VLB）的理论分析，并在连续时间中建立模型等价性。
展示在 CIFAR-10 和 ImageNet 上的最先进对数似然结果，并通过 bits-back 编码显示无损压缩潜力。

提出的方法

定义一个前向高斯扩散过程，设 z_t 条件于 x 的分布为 q(z_t|x)=N(alpha_t x, sigma_t^2 I)。
通过神经网络 gamma_eta(t) 学习单调的噪声时间表 sigma_t^2，使 SNR(t)=exp(-gamma_eta(t))。
使用反向时间生成模型 p(z_s|z_t) 等于 q(z_s|z_t, x) 但将 x 替换为去噪预测 x_hat_theta(z_t; t)。
通过一个噪声预测网络 epsilon_hat_theta(z_t; t) 来参数化去噪模型，使 x_hat_theta(z_t; t) = (z_t - sigma_t epsilon_hat_theta(z_t; t))/alpha_t。
将傅里叶特征（对缩放后的 z_t 的 sin/cos）引入去噪器，以捕捉细尺度细节并提高似然。
优化 p(x) 的变分下界（VLB），使扩散损失 L_T(x) 简化为一个可处理、数值稳定的形式；扩展到连续时间 L_infty(x) 并显示对扩散时间表端点的不变性。

实验结果

研究问题

RQ1扩散基生成模型是否能在标准的图像密度估计基准上达到最先进的似然？
RQ2将扩散过程（噪声时间表）与模型参数联合优化是否比固定时间表有更好的性能？
RQ3连续时间扩散公式如何影响对前向过程和 VLB 的不变性？
RQ4哪些架构创新（如傅里叶特征）和训练目标在保持可控优化的同时提升似然？
RQ5扩散模型是否可以通过 bits-back 编码有效用于无损压缩？

主要发现

模型类型	CIFAR-10 (无增强) Bits/Dim	CIFAR-10 (有增强) Bits/Dim	ImageNet-64 (无增强) Bits/Dim	ImageNet-32 (有增强) Bits/Dim
VDM（变分界）；Diff	2.65	2.49	3.72	3.40

VDMs 在 CIFAR-10 和 ImageNet 密度估计基准上实现了最先进的对数似然，超过自回归模型。
推导出离散时间的扩散损失的简单表达式以及连续时间损失 L_infty(x)，澄清 VLB 行为。
在连续时间中，VLB 对扩散时间表形状不变，只依赖于 SNR 端点，从而实现方差最小化时间表的优化。
将傅里叶特征添加到去噪器显著提升似然，尤其是在学习到 SNR 时。
学习 SNR 端点并使用连续时间、方差感知的时间表可加速训练并降低估计方差。
实验表明，在进行似然优化时，模型在感知质量指标（FID）上也可以具有竞争力，前提是使用加权扩散损失，尽管本文重点在似然。
该模型通过 bits-back 编码支持无损压缩，在 CIFAR-10 上实现了有竞争力的净码长。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。