QUICK REVIEW

[论文解读] Common Diffusion Noise Schedules and Sample Steps are Flawed

Shanchuan Lin, Bingchen Liu|arXiv (Cornell University)|May 15, 2023

Advanced Neuroimaging Techniques and Applications被引用 10

一句话总结

论文识别了常见扩散噪声日程和采样器起始点的缺陷，显示训练/推理不匹配和亮度偏差，并提出修正方案：零终端 SNR 日程、v-prediction、从最后一个时间步开始、以及分类器无引导重缩放。

ABSTRACT

We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.

研究动机与目标

识别常见扩散噪声日程如何未能强制实现零终端 SNR，以及这为何导致训练/推理不匹配。
展示不从最后一个时间步开始的采样实现如何加剧亮度偏差。
提出实用修正措施，以对齐训练与推理并提高样本质量。
通过在 Stable Diffusion 变体与标准数据集上的训练与评估来验证修正。

提出的方法

通过在 sqrt(alpha_bar) 空间对现有噪声日程进行重新缩放来强制零终端 SNR（算法 1）。
当终端 SNR 为零时切换到 v-预测和 v-损失，以提供有意义的引导（方程 11–12）。
要求采样器从最后一个时间步开始以对齐训练/推理（参见表 2 和第 3.3 节）。
提出分类器无引导重缩放以防止过曝（方程 13–16，算法 2）。
使用提出的日程与采样策略训练模型，并在定性与定量上与基线 Stable Diffusion 进行比较（第 4 节）。
讨论在零 SNR 时正确处理 v-prediction 与 epsilon，避免在零 SNR 时使用基于 epsilon 的表述（第 6 节）。

实验结果

研究问题

RQ1常见的扩散噪声日程是否在最后一个时间步允许非零 SNR，从而造成训练/推理不匹配？
RQ2从最后一个时间步开始采样是否能改善训练与推理的对齐并拓宽可实现的亮度范围？
RQ3简单的日程重新缩放、v-prediction 与 CFG 重缩放能否修复在 Stable Diffusion 中观察到的亮度与曝光问题？
RQ4提议的变化如何影响定量分布对齐（FID/IS）和定性样本多样性？
RQ5当终端 SNR 为零时，采样器实现的实际考虑因素有哪些？

主要发现

模型	FID ↓	IS ↑
Stable Diffusion \| Official	23.76	32.84
SD with our data, no fixes	22.96	34.11
SD with fixes (Ours)	21.66	36.16

零终端 SNR 日程消除在最后一步低频信号的泄漏，使训练与推理保持一致。
V-prediction 在终端 SNR 为零时维持有意义的损失引导，其视觉质量与 epsilon 预测相近。
在采样器中从最后一个时间步开始对在零终端 SNR 日程下与训练保持一致至关重要。
分类器无引导重缩放在终端 SNR 接近零时减轻过曝（显示 phi 在 [0.5,0.75]）。
采用修正的微调模型在 COCO 2014 验证集上获得更高的 FID/IS：21.66 (FID) 和 36.16 (IS)，对比官方 SD v2.1-base 的 23.76 和 32.84。
当 S 趋小时，尾部采样步长选择比 Linspace 更高效；随着 S 增大，差异减小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。