QUICK REVIEW

[Paper Review] Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras, Miika Aittala|arXiv (Cornell University)|Jun 1, 2022

Generative Adversarial Networks and Image Synthesis307 citations

TL;DR

The paper presents a modular design space for diffusion models, proposes sampling and training improvements, and achieves new state-of-the-art FIDs on CIFAR-10 and ImageNet-64, with faster sampling. It also shows improvements to pre-trained models from prior work.

ABSTRACT

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.

Motivation & Objective

Clarify the practical design space of diffusion-based generative models by separating concrete components and choices.
Improve sampling efficiency and output quality through higher-order solvers, optimized schedules, and stochasticity analysis.
Develop principled preconditioning for score networks and end-to-end training practices to enhance robustness and performance.
Demonstrate modular improvements on existing models and show gains on CIFAR-10 and ImageNet-64.

Proposed method

Express diffusion models in a common ODE/SDE framework and identify independent design components.
Apply a 2nd-order Heun solver for deterministic sampling with an optimized time schedule and curvature-aware σ(t) and s(t).
Introduce a stochastic sampler with controlled noise injection (churn) to analyze the role of stochasticity in sampling.
Propose a preconditioned score network Dθ with σ-dependent skip and scaling (cin, cout, cskip, cnoise) to stabilize training.
Optimize loss weighting and noise distribution during training (λ(σ), ptrain(σ)) and employ non-leaky data augmentation to improve generalization.
Demonstrate training improvements leading to new state-of-the-art FIDs and demonstrate faster sampling (NFE reductions).

Experimental results

Research questions

RQ1What are the independent design choices in diffusion models that affect performance and sampling speed?
RQ2How do sampling strategies (deterministic vs stochastic) impact image quality across model families when decoupled from training?
RQ3Can principled preconditioning and training losses improve robustness and final FID across resolutions and datasets?
RQ4What is the impact of scheduling (σ(t), s(t)) on ODE trajectories and denoiser guidance during sampling?
RQ5To what extent can modular improvements transfer to pre-trained diffusion models from prior work?

Key findings

Achieved state-of-the-art FID of 1.79 on CIFAR-10 (conditional) and 1.97 (unconditional) with faster sampling (35 Dθ evaluations per image).
Attained near-state-of-the-art 1.55 on ImageNet-64 with prior models and 1.36 after re-training with proposed improvements.
Showed substantial sampling speedups by adopting a 2nd-order Heun solver, optimized σ(t) and s(t), and a refined time-step schedule.
Demonstrated that better training preconditioning and loss weighting (λ(σ), ptrain(σ)) plus non-leaky augmentation yield strong improvements across resolutions.
Found that stochastic sampling benefits depend on model setup and can be reduced or eliminated with improved training (deterministic sampling can outperform stochastic in certain cases).
Validated the modularity of improvements by applying them to multiple model families (VP/VE, DDPM/DDIM, ImageNet-64) and achieving consistent gains.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.