QUICK REVIEW

[論文レビュー] Improved Denoising Diffusion Probabilistic Models

Alex Nichol, Prafulla Dhariwal|arXiv (Cornell University)|Feb 18, 2021

Generative Adversarial Networks and Image Synthesis参考文献 27被引用数 412

ひとこと要約

この論文は DDPM を強化して競争力のある log-likelihood を達成し、学習済み分散によりサンプリングを高速化し、拡散モデルが GAN よりも多くのモードをカバーすることを示し、計算資源とともにスケールすることを示しています。

ABSTRACT

Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples. We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code at https://github.com/openai/improved-diffusion

研究の動機と目的

Motivate evaluating DDPMs on log-likelihood and distribution coverage.
Improve DDPM log-likelihood while maintaining sample quality.
Enable faster sampling by learning reverse-process variances.
Investigate training objectives and noise schedules to reduce gradient noise.
Show scalability of DDPMs with model size and compute.

提案手法

Introduce learned reverse-process variance via Sigma_theta as an interpolation between beta_t and tilde_beta_t (Equation 15).
Propose a hybrid training objective L_hybrid = L_simple + lambda L_vlb to balance sample quality and likelihood.
Replace the linear noise schedule with a cosine schedule to improve information retention during diffusion (Equation 17).
Apply importance sampling to estimate L_vlb with reduced gradient noise (Equation 18).
Compare training objectives (L_simple, L_hybrid, L_vlb) and schedules across ImageNet 64x64 and CIFAR-10 with ablations.
Demonstrate faster sampling by enabling high-quality samples with fewer diffusion steps, using learned sigmas.]
research_questions/13: ["Can DDPMs achieve competitive log-likelihoods on high-diversity datasets like ImageNet 64x64?", "Does learning the reverse variances improve both likelihood and sample quality?", "Does a cosine noise schedule improve information retention and sample quality over a linear schedule?", "Can importance sampling reduce gradient noise in log-likelihood optimization?", "How do DDPMs scale with model size and training compute in terms of FID and NLL?"]
key_findings1: ["Learned variances via Sigma_theta substantially improve log-likelihood while preserving sample quality.","A hybrid objective with learned sigmas and cosine schedule yields better NLL and comparable FID to prior baselines.","Importance sampling reduces gradient noise in L_vlb, enabling better log-likelihood optimization.","Diffusion models achieve higher recall than GANs at similar FID, indicating broader mode coverage.","Sampling speed improves: 100 steps can approach near-optimal FID for fully trained models.","Model size and compute show predictable performance scaling for FID and NLL."]
table_headers: ["Iters", "T", "Schedule", "Objective", "NLL", "FID"]
table_rows: [["200K", "1K", "linear", "L_simple", "3.99", "32.5"], ["200K", "4K", "linear", "L_simple", "3.77", "31.3"], ["200K", "4K", "linear", "L_hybrid", "3.66", "32.2"], ["200K", "4K", "cosine", "L_simple", "3.68", "27.0"], ["200K", "4K", "cosine", "L_hybrid", "3.62", "28.0"], ["200K", "4K", "cosine", "L_vlb", "3.57", "56.7"], ["1.5M", "4K", "cosine", "L_hybrid", "3.57", "19.2"], ["1.5M", "4K", "cosine", "L_vlb", "3.53", "40.1"]]}{
title_not_translated_note:

実験結果

リサーチクエスチョン

RQ1Can DDPMs achieve competitive log-likelihoods on high-diversity datasets like ImageNet 64x64?
RQ2Does learning the reverse variances improve both likelihood and sample quality?
RQ3Does a cosine noise schedule improve information retention and sample quality over a linear schedule?
RQ4Can importance sampling reduce gradient noise in log-likelihood optimization?
RQ5How do DDPMs scale with model size and training compute in terms of FID and NLL?

主な発見

Iters	T	Schedule	Objective	NLL	FID
200K	1K	linear	L_simple	3.99	32.5
200K	4K	linear	L_simple	3.77	31.3
200K	4K	linear	L_hybrid	3.66	32.2
200K	4K	cosine	L_simple	3.68	27.0
200K	4K	cosine	L_hybrid	3.62	28.0
200K	4K	cosine	L_vlb	3.57	56.7
1.5M	4K	cosine	L_hybrid	3.57	19.2
1.5M	4K	cosine	L_vlb	3.53	40.1

Learned variances via Sigma_theta substantially improve log-likelihood while preserving sample quality.
A hybrid objective with learned sigmas and cosine schedule yields better NLL and comparable FID to prior baselines.
Importance sampling reduces gradient noise in L_vlb, enabling better log-likelihood optimization.
Diffusion models achieve higher recall than GANs at similar FID, indicating broader mode coverage.
Sampling speed improves: 100 steps can approach near-optimal FID for fully trained models.
Model size and compute show predictable performance scaling for FID and NLL.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。