QUICK REVIEW

[論文レビュー] Variational Diffusion Models

Diederik P. Kingma, Tim Salimans|arXiv (Cornell University)|Jul 1, 2021

Generative Adversarial Networks and Image Synthesis参考文献 41被引用数 282

ひとこと要約

この論文は、学習可能な拡散スケジュールを学ぶ Variational Diffusion Models (VDMs) を導入し、フーリエ特徴を用いて CIFAR-10 および ImageNet の密度推定ベンチマークで最先端の対数尤度を達成し、変分下限 (VLB) の理論的洞察と拡散過程の同等性を提供する。

ABSTRACT

Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to use the model as part of a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum. Code is available at https://github.com/google-research/vdm .

研究の動機と目的

likelihood に基づく画像生成を拡散モデルで動機づけ、密度推定ベンチマークでオートレグレッシブモデルとの差を縮める。
学習可能な拡散スケジュールとフーリエ特徴を備えた柔軟な拡散ベースファミリ (VDMs) を導入し、尤度を改善。
拡散モデルの変分下限 (VLB) の理論分析を提供し、連続時間におけるモデルの同等性を確立。
CIFAR-10 および ImageNet で最先端の対数尤度結果を示し、ビットバック符号化による無損失圧縮の可能性を示す。

提案手法

前方ガウス拡散過程を q(z_t|x)=N(alpha_t x, sigma_t^2 I) として定義。
神経ネットワーク gamma_eta(t) による単調ノイズスケジュール sigma_t^2 を学習し、SNR(t)=exp(-gamma_eta(t))。
逆時刻生成モデルを用い、p(z_s|z_t) は q(z_s|z_t, x) に等しいが x はデノイズ予測 x_hat_theta(z_t; t) に置換。
ノイズ予測ネットワーク epsilon_hat_theta(z_t; t) を用いてデノイジングモデルをパラメータ化し、x_hat_theta(z_t; t) = (z_t - sigma_t epsilon_hat_theta(z_t; t))/alpha_t と表す。
デノイザーにスケールされた z_t の sin/cos の Fourier 特徴を組み込み、細かなディテールを捉え尤度を改善。
p(x) の変分下限 (VLB) を最適化し、拡散損失 L_T(x) を計算可能で数値的に安定な形に簡略化；連続時間の L_infty(x) へ拡張し、拡散スケジュールの端点不変性を示す。

実験結果

リサーチクエスチョン

RQ1拡散ベースの生成モデルは標準的な画像密度推定ベンチマークで最先端の尤度を達成できるか？
RQ2モデルパラメータと同時に拡散過程（ノイズスケジュール）を最適化すると、固定スケジュールより性能は改善されるか？
RQ3連続時間の拡散形が前方過程の不変性と VLB にどのように影響するか？
RQ4尤度を改善しつつ最適化を扱いやすくするためのアーキテクチャ的革新（例： Fourier 特徴）や訓練目的は何か？
RQ5拡散モデルはビットバック符号化による無損失圧縮に効果的に用えるか？

主な発見

モデルタイプ	CIFAR-10 (拡張なし) Bits/Dim	CIFAR-10 (拡張あり) Bits/Dim	ImageNet-64 (拡張なし) Bits/Dim	ImageNet-32 (拡張あり) Bits/Dim
VDM (variational bound); Diff	2.65	2.49	3.72	3.40

VDMs は CIFAR-10 および ImageNet 密度推定ベンチマークで最先端の対数尤度を達成し、自己回帰モデルを上回った。
離散時間の拡散損失の簡易表現と連続時間の損失 L_infty(x) が導出され、VLB の挙動を明確化。
連続時間では VLB は拡散スケジュール形状に不変で、SNR の端点にのみ依存するため分散最小化スケジュールの最適化を可能にする。
デノイザーに Fourier 特徴を追加すると尤度が大幅に向上し、特に SNR が学習される場合に効果的。
SNR のエンドポイントを学習し、連続時間・分散を意識したスケジュールを用いると訓練が速まり推定量の分散が低減する。
実験では尤度最適化を行うと、重み付き拡散損失を用いた場合に知覚品質指標 (FID) で競合的な性能を示すことができるが、本論の焦点は尤度にある。
モデルはビットバック符号化による無損失圧縮をサポートし、CIFAR-10 で競争力のある純符号長を達成。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。