QUICK REVIEW

[论文解读] DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics

Kaiwen Zheng, Cheng Lü|arXiv (Cornell University)|Oct 20, 2023

Generative Adversarial Networks and Image Synthesis被引用 9

一句话总结

一个训练无关的扩散 ODE 求解器，使用经验模型统计（EMS）来形成高阶求解器用于 DPMs，在像素空间和潜空间模型上以较少的函数评估实现更好的样本质量。

ABSTRACT

Diffusion probabilistic models (DPMs) have exhibited excellent performance for high-fidelity image generation while suffering from inefficient sampling. Recent works accelerate the sampling procedure by proposing fast ODE solvers that leverage the specific ODE form of DPMs. However, they highly rely on specific parameterization during inference (such as noise/data prediction), which might not be the optimal choice. In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution. Based on such formulation, we propose DPM-Solver-v3, a new fast ODE solver for DPMs by introducing several coefficients efficiently computed on the pretrained model, which we call empirical model statistics. We further incorporate multistep methods and a predictor-corrector framework, and propose some techniques for improving sample quality at small numbers of function evaluations (NFE) or large guidance scales. Experiments show that DPM-Solver-v3 achieves consistently better or comparable performance in both unconditional and conditional sampling with both pixel-space and latent-space DPMs, especially in 5$\sim$10 NFEs. We achieve FIDs of 12.21 (5 NFE), 2.51 (10 NFE) on unconditional CIFAR10, and MSE of 0.55 (5 NFE, 7.5 guidance scale) on Stable Diffusion, bringing a speed-up of 15%$\sim$30% compared to previous state-of-the-art training-free methods. Code is available at https://github.com/thu-ml/DPM-Solver-v3.

研究动机与目标

促进高效的扩散模型采样并识别模型参数化对采样精度的影响。
提出一个含有 EMS 的 ODE 形式以在采样过程中最小化一阶离散化误差。
开发一个利用 EMS 的高阶多步预测-校正扩散 ODE 求解器。
在像素空间与潜空间 DPMs 中，在少量 NFEs 下对无条件与条件采样均显示持续改进。

提出的方法

将扩散 ODE 重新表述为包含三个系数函数 l_lambda、s_lambda、b_lambda，以控制线性/非线性分裂（EMS 框架）。
将 EMS 定义为从预训练模型计算的解析解，目标是最小化非线性部分的期望平方雅可比矩阵，从而实现对非线性项的最优线性近似。
引入 g_theta 作为目标函数的缩放版本，以减少离散化误差，并推导出条件表明一阶误差取决于 f_theta^{(1)} - s_lambda f_theta - b_lambda。
利用 g_theta 的泰勒展开及在 lambda_s 处的预先计算导数 g_theta^{(k)}，推导出一个高阶局部近似（n+1 阶），并给出明确的离散化公式。
应用全局多步预测-校正框架以重用过去的导数，从而实现高阶、低 NFE 的采样；包括伪阶策略和用于大引导尺度的半校正变体。

实验结果

研究问题

RQ1扩散采样中模型参数化的选择如何影响离散化误差和样本质量？
RQ2基于 EMS 的 ODE 形式是否能在低 NFEs 下实现高阶、无训练的采样器，超越现有的指数积分求解器？
RQ3带有 EMS 的多步预测-校正策略是否在像素空间和潜空间 DPMs 上实现更好的无条件与有条件采样？
RQ4哪些实用技巧（如伪阶、半校正）在较小的 NFEs 或较大引导尺度下能提升性能？
RQ5在典型 DPM 设置下，提出的基于 EMS 的高阶求解器是否具备收敛性保证？

主要发现

方法	模型	NFE	5	6	8	10	12	15	20	25
DPM-Solver-v3	(CIFAR-10 pixel-space, ScoreSDE)	5	12.21	8.56	3.50	2.51	2.24	2.10	2.02	2.00
DPM-Solver-v3	(CIFAR-10 pixel-space, ScoreSDE)	6	8.56	3.50	2.51	2.24	2.10	2.02	2.00	2.00
DPM-Solver-v3	(CIFAR-10 pixel-space, ScoreSDE)	8	3.50	2.51	2.24	2.10	2.02	2.00	2.00	2.00
DPM-Solver-v3	(CIFAR-10 pixel-space, ScoreSDE)	10	2.51	2.24	2.10	2.02	2.00	2.00	2.00	2.00

DPM-Solver-v3 在无条件和有条件设置下，在 5–20 NFEs 的范围内持续提高样本质量，相较于早期的快速采样器。
在 CIFAR-10 的像素空间 DPMs 上，DPM-Solver-v3 在 5 NFE 时达到 12.21 的 FID，在 10 NFE 时达到 2.51 的 FID，显示出对比先前方法的显著加速。
在潜空间 DPMs（Stable Diffusion）下，该方法获得有竞争力的均方误差（如在 5 NFE、7.5 指导下为 0.55）。
在 CIFAR-10 的场景中，UniPC 和 DPM-Solver-v3 展现出强劲性能；DPM-Solver-v3 在 NFEs 为 5,6,8,10,12,15,20,25 时分别达到 12.76、7.40、3.94、3.40、3.24、2.91、2.71、2.64，显示在非常低 NFEs 时也具有高精度。
基于 EMS 的表述为将采样 ODE 针对预训练模型进行有原则的定制，降低离散化误差并在低 NFEs 下提升稳定性。
作者还引入了实用技巧（伪阶求解器和半校正）以在具有挑战性的采样场景中进一步提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。