QUICK REVIEW

[논문 리뷰] Understanding Diffusion Models: A Unified Perspective

Calvin Luo|arXiv (Cornell University)|2022. 08. 25.

Generative Adversarial Networks and Image Synthesis인용 수 112

한 줄 요약

본 논문은 확산 모델을 가능도 기반(likelihood-based) 및 점수 기반(score-based) 관점에서 제시하고, 변분 확산 모델(VDMs)에 대한 ELBO를 도출하며, 이해를 심화하고 학습 및 샘플링을 안내하기 위한 여러 등가 해석을 제공합니다.

ABSTRACT

Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.

연구 동기 및 목표

확산 모델이 가능도 기반 및 점수 기반 생성 프레임워크에 어떻게 맞춰지는지 명확히 한다.
Evidence Lower Bound (ELBO) for Variational Diffusion Models (VDMs)를 도출하고 설명한다.
확산 모델 학습을 위한 재구성, 사전 매칭, 일관성 등의 여러 해석 가능 관점을 제시한다.
확산 모델과 Variational Autoencoders 및 Hierarchical Variational Autoencoders를 연결하여 관점을 통합한다.
ELBO 추정에서의 실용적 함의 및 학습, 샘플링, 분산 고려 사항을 논의한다.

제안 방법

표준 잠재 변수 모델에 대한 ELBO 도출을 제시하고 이를 Hierarchical Variational Autoencoders (HVAE) 및 Markovian HVAE로 확장한다.
고정 Gaussian 인코더 구조와 시간 가변적 노이즈 스케줄을 갖는 Markovian HVAE로서 Variational Diffusion Models (VDMs)를 도입한다.
단일 무작위 변수에 의존하도록 인코더 전이를 재매개변수화하여 VDM의 저분산 ELBO 형태를 도출한다.
ELBO를 재구성 항목, 사전 매칭 항목, 잡음 제거 일관성 항목으로 해석 가능하게 분해한다.
표준 Gaussian 잡음에서 시작하여 잡음 제거 전이를 적용하는 방식으로 VDM으로 샘플링하는 방법을 설명한다.
확률적/가이던스 프레임워크: 가능도 기반, 점수 기반, 가이드/분류기 없는 가이드(Classifier-Free guidance) 프레임워크의 세 가지 등가 해석으로 확산 프로세스를 연결한다.

실험 결과

연구 질문

RQ1확산 모델을 가능도 기반 생성 프레임워크와 점수 기반 생성 프레임워크 모두에서 어떻게 이해할 수 있는가?
RQ2확산 기반 생성 모델에 대한 올바른 ELBO 공식은 무엇이며 어떻게 효율적으로 계산할 수 있는가?
RQ3확산 ELBO의 해석 가능한 구성요소는 무엇이며 재구성 및 사전 매칭과 같은 학습 목표와 어떻게 연결되는가?
RQ4Variational Autoencoders 및 Hierarchical Variational Autoencoders가 확산 모델과 어떤 방식으로 통일된 관점으로 관련되는가?
RQ5제안된 ELBO 분해 및 가이던스 메커니즘을 사용할 때 학습 및 샘플링에 어떤 실용적 함의가 생기는가?

주요 결과

VDMs는 확산 모델을 Gaussian 인코더를 갖춘 Markovian HVAE 및 최종 잠재변수를 표준 Gaussian로 보는 통합적 관점을 제공한다.
VDMs의 ELBO는 Reconstruction 항, Prior-matching 항, Denosing-consistency 항으로 분해 가능하며, 저분산 몬테카를로 추정을 가능하게 한다.
재매개변수화 트릭을 활용한 재구성은 각 항이 단일 무작위 변수에 대한 기대로 표현되도록 하여 실무에서 분산을 줄인다.
도출 결과는 확산 모델에 대해 세 가지 등가 해석으로의 가능도를 기반, 점수 기반, 가이드 기반(분류기 가이드 및 분류기 없는 가이드) 해석을 드러낸다.
학습 역학은 순방향 Gaussian 잡음에 맞춘 역방향 잡음 제거 전이를 정렬시키는 방향으로 작동하며, 시간이 충분히 커질 때 최종 잠재 분포가 표준 Gaussian 사전과 일치한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.