QUICK REVIEW

[論文レビュー] Understanding Diffusion Models: A Unified Perspective

Calvin Luo|arXiv (Cornell University)|Aug 25, 2022

Generative Adversarial Networks and Image Synthesis被引用数 112

ひとこと要約

本論文は、likelihood-basedおよびscore-basedの視点から拡散モデルを提示し、variational diffusion modelsのELBOを導出し、理解を深め、学習とサンプリングを導くための複数の等価な解釈を提供する。

ABSTRACT

Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.

研究の動機と目的

拡散モデルがlikelihood-basedおよびscore-basedの生成モデルフレームワークにどのように適合するかを明らかにする。
Variational Diffusion Models (VDMs)のEvidence Lower Bound (ELBO)を導出し、説明する。
拡散モデルの学習における複数の解釈可能な視点（再構成、prior matching、整合性）を提示する。
視点を統一するために、拡散モデルをVariational AutoencodersおよびHierarchical Variational Autoencodersと関連付ける。
ELBO推定における学習、サンプリング、分散の考慮について実務的な示唆を議論する。

提案手法

標準的な潜在変数モデルのELBO推定を提示し、それをHierarchical Variational Autoencoders (HVAE)およびMarkovian HVAEに拡張する。
固定されたGaussianエンコーダー構造と時変ノイズスケジュールを持つMarkovian HVAEとしてVariational Diffusion Models (VDMs)を導入する。
エンコーダー遷移を単一の乱数変数に依存させることで、VDMsの低分散ELBO形式を導出する。
ELBOを解釈しやすい項に分解する：再構成、 prior matching、denoising consistency。
標準Gaussianノイズから開始してdenoising遷移を適用することでVDMsを用いたサンプリングを説明する。
拡散過程を3つの等価な解釈（likelihood-based、score-based、guided/Classifier-Free guidanceフレームワーク）に結びつける。

実験結果

リサーチクエスチョン

RQ1拡散モデルはlikelihood-basedおよびscore-basedの生成モデルフレームワークの双方の中でどのように理解できるか。
RQ2拡散ベースの生成モデルにおける正しいELBOの定式化は何か、そしてそれをどう効率的に計算できるか。
RQ3拡散ELBOの解釈可能な成分は何で、再構成やprior matchingなどの訓練目的とどう関連するか。
RQ4Variational AutoencodersおよびHierarchical Variational Autoencodersは拡散モデルと統一的な視点でどう関連するか。
RQ5提案されたELBO分解とガイダンス機構を用いると、拡散モデルの訓練とサンプリングにどんな実践的影響が生じるか。

主な発見

VDMsは、拡散モデルをGaussianエンコーダーと標準ガウス分布の最終潜在変数を持つMarkovian HVAEとして扱うことで統一的な見方を提供する。
VDMsのELBOは再構成項、prior-matching項、denoising-consistency項に分解でき、低分散モンテカルロ推定を可能にする。
再parameterizationトリックに基づく再定式化により、各項が単一の乱数変数の期待値となるELBOを得られ、実践上の分散を減らす。
推定は、likelihood-based、score-based、およびguidance-based（classifier guidanceおよびclassifier-free guidance）という3つの等価な解釈を明らかにする。
学習ダイナミクスは backward denoising遷移を forward Gaussian corruptionsと一致させることによって推進され、Tが大きくなると最終的な潜在分布が標準Gaussian priorに一致する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。