QUICK REVIEW

[论文解读] Classifier-Free Diffusion Guidance

Jonathan Ho, Tim Salimans|arXiv (Cornell University)|Jul 26, 2022

Model Reduction and Neural Networks被引用 731

一句话总结

本文提出 classifier-free guidance，通过在扩散模型中联合训练条件模型和无条件模型并结合它们的分数估计，消除对辅助分类器的需求，从而在样本保真度与多样性之间实现权衡。

ABSTRACT

Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. Classifier guidance combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. It also raises the question of whether guidance can be performed without a classifier. We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.

研究动机与目标

激励在扩散模型中减少对外部分类器作为引导的依赖。
证明通过联合训练条件和无条件扩散模型，可以用纯生成模型实现引导。
显示调节引导强度可在样本质量（IS）与多样性（FID）之间产生权衡。
提供一个简单实用的 classifier-free 指导的训练与采样流程，媲美基于分类器的方法。

提出的方法

通过在单一网络中随机以概率 p_uncond 丢弃条件信息，训练包含条件分支和无条件分支的扩散模型。
预测去噪分数 ε_θ(z_λ, c) 与 ε_θ(z_λ)，在采样时形成引导分数 ε̃_t = (1+w)ε_θ(z_λ, c) − wε_θ(z_λ)（无分类器引导）。
使用联合目标，通过对多个噪声尺度 λ 的去噪分数匹配来训练条件和无条件组件。
训练期间，从基于余弦的调度中采样 λ，并优化 ε_θ 以匹配被污染的 zλ 的真实 ε。
推理时，使用固定混合权重 w 进行采样，以权衡保真度与多样性。
讨论该方法依赖于无约束网络，因此引导分数可能并非任何显式分类器的梯度。

实验结果

研究问题

RQ1不训练单独分类器，扩散模型中能实现引导吗？
RQ2混合无条件与有条件的分数估计，是否提供类似分类器引导的可控 IS/FID 权衡？
RQ3与分类器引导扩散相比，classifier-free 指导在简单性、训练要求和采样效率方面有何差异？
RQ4训练无条件概率 p_uncond 与引导强度 w 对图像质量与多样性的影响？

主要发现

模型	FID (下降)	IS (↑)
我们的方法 ADM (Dhariwal & Nichol 2021)	2.07	-
我们的方法 CDM (Ho et al. 2021)	1.48	67.95
我们的方法 w=0.0, p_uncond=0.1/0.2/0.5	1.8 / 1.8 / 2.21	53.71 / 52.9 / 47.61
我们的方法 w=0.1, p_uncond=0.1/0.2/0.5	1.55 / 1.62 / 1.91	66.11 / 64.58 / 56.1
我们的方法 w=0.3, p_uncond=0.1/0.2/0.5	3.03 / 2.93 / 2.65	92.8 / 88.64 / 74.92
我们的方法 w=0.4, p_uncond=0.1/0.2/0.5	4.3 / 4 / 3.44	106.2 / 101.11 / 84.27
我们的方法 w=0.5, p_uncond=0.1/0.2/0.5	5.74 / 5.19 / 4.34	119.3 / 112.15 / 92.95
我们的方法 w=0.6, p_uncond=0.1/0.2/0.5	7.19 / 6.48 / 5.27	131.1 / 122.13 / 102
我们的方法 w=0.7, p_uncond=0.1/0.2/0.5	8.62 / 7.73 / 6.23	141.8 / 131.6 / 109.8
我们的方法 w=0.8, p_uncond=0.1/0.2/0.5	10.08 / 8.9 / 7.25	151.6 / 140.82 / 116.9
我们的方法 w=0.9, p_uncond=0.1/0.2/0.5	11.41 / 10.09 / 8.21	161 / 150.26 / 124.6
我们的方法 w=1.0, p_uncond=0.1/0.2/0.5	12.6 / 11.21 / 9.13	170.1 / 158.29 / 131.1
我们的方法 w=2.0, p_uncond=0.1/0.2/0.5	21.03 / 18.79 / 16.16	225.5 / 212.98 / 183
我们的方法 w=3.0, p_uncond=0.1/0.2/0.5	24.83 / 22.36 / 19.75	250.4 / 237.65 / 208.9
我们的方法 w=4.0, p_uncond=0.1/0.2/0.5	26.22 / 23.84 / 21.48	260.2 / 248.97 / 225.1

通过调整引导强度 w，classifier-free 指导实现了与分类器引导类似的可控 IS/FID 权衡。
对于 64×64 的 ImageNet，较小的 w 获得最优 FID，而较大的 w 提高了 Inception Score，清晰地体现了保真度与多样性之间的权衡。
对于 128×128 的 ImageNet，在不同的时间步 T 下，该方法在 FID 与 IS 方面具有竞争力；在 w 值如 0.3–0.4 时，达到较强的 IS 与良好的 FID。
使用较小的 p_uncond（例如 0.1–0.2）进行训练就足够，表明引导所需的无条件容量有限。
最强的引导样本显示出更高的保真度但多样性降低，符合质量与多样性的权衡。
在所报告的基准测试上，该方法与现有方法相匹配或超越，在可比的采样预算下具有竞争力或更优的 IS/FID。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。