QUICK REVIEW

[논문 리뷰] Variational Diffusion Models

Diederik P. Kingma, Tim Salimans|arXiv (Cornell University)|2021. 07. 01.

Generative Adversarial Networks and Image Synthesis참고 문헌 41인용 수 282

한 줄 요약

본 논문은 학습 가능한 확산 스케줄을 학습하고 Fourier 특징을 활용하여 CIFAR-10 및 ImageNet 밀도 추정 벤치마크에서 최첨단 로그 가능도(log-likelihood)를 달성하는 Variational Diffusion Models (VDMs)을 제시하고, VLB(Variational Lower Bound)와 확산 과정 동등성에 대한 이론적 통찰을 제공합니다.

ABSTRACT

Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to use the model as part of a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum. Code is available at https://github.com/google-research/vdm .

연구 동기 및 목표

확산 모델을 이용한 가능도 기반 이미지 생성의 동기를 부여하고 밀도 추정 벤치마크에서 자기회귀 모델과의 격차를 좁힙니다.
학습 가능한 확산 스케줄과 Fourier 특징을 갖춘 유연한 확산 기반 계열(VDMs)을 도입하여 가능도를 향상시킵니다.
확산 모델의 Variational Lower Bound(VLB)에 대한 이론적 분석을 제공하고 연속 시간에서 모델 간의 등가성을 확립합니다.
CIFAR-10 및 ImageNet에서 최첨단 로그 가능도 결과를 시연하고 bits-back 코딩을 통한 무손실 압축 가능성을 보여줍니다.

제안 방법

Define a forward Gaussian diffusion process with z_t conditioned on x as q(z_t|x)=N(alpha_t x, sigma_t^2 I).
Learn a monotonic noise schedule sigma_t^2 via a neural network gamma_eta(t) such that SNR(t)=exp(-gamma_eta(t)).
Use a reverse-time generative model with p(z_s|z_t) equal to q(z_s|z_t, x) but with x replaced by a denoised prediction x_hat_theta(z_t; t).
Parameterize the denoising model through a noise-prediction network epsilon_hat_theta(z_t; t) with x_hat_theta(z_t; t) = (z_t - sigma_t epsilon_hat_theta(z_t; t))/alpha_t.
Incorporate Fourier features (sin/cos of scaled z_t) into the denoiser to capture fine-scale details and improve likelihood.
Optimize the variational lower bound (VLB) of p(x), with a diffusion loss L_T(x) that simplifies to a tractable, numerically stable form; extend to continuous-time L_infty(x) and show invariance to the diffusion schedule endpoints.

실험 결과

연구 질문

RQ1확산 기반 생성 모델이 표준 이미지 밀도 추정 벤치마크에서 최첨단 가능도에 도달할 수 있는가?
RQ2모델 매개변수와 함께 확산 프로세스(노이즈 스케줄)를 공동 최적화하는 것이 고정 스케줄에 비해 성능을 향상시키는가?
RQ3연속 시간 확산 공식이 순방향 프로세스에 대한 불변성과 VLB에 어떤 영향을 미치는가?
RQ4가능도를 개선하면서 계산상 최적화를 유지하는 아키텍처 혁신(예: Fourier 특징)과 학습 목표는 무엇인가?
RQ5비트백 코딩을 통한 손실 없는 압축에 확산 모델을 효과적으로 사용할 수 있는가?

주요 결과

VDMs는 CIFAR-10 및 ImageNet 밀도 추정 벤치마크에서 자기회귀 모델을 능가하는 최첨단 로그 가능도를 달성합니다.
이산 시간의 확산 손실에 대한 간단한 표현과 연속 시간 손실 L_infty(x)가 도출되어 VLB 동작이 명확해집니다.
연속 시간에서 VLB는 확산 스케줄 형태에 무관하고 엔드포인트의 SNR에만 의존하므로 분산 최소화 스케줄 최적화가 가능해집니다.
덴저에 Fourier 특징을 추가하면 가능도가 크게 향상되며, 특히 SNR이 학습될 때 더 큰 효과를 보입니다.
SNR 엔드포인트를 학습하고 연속 시간이며 분산 인식 스케줄을 사용하면 학습 속도가 빨라지고 추정기의 분산이 감소합니다.
실험에서 가능도 최적화와 함께 가중된 확산 손실을 사용할 때 지각 품질 지표(FID)에서도 경쟁력이 나타나지만 이 연구의 초점은 가능도입니다.
모델은 bits-back 코딩을 통한 무손실 압축을 지원하여 CIFAR-10에서 경쟁력 있는 순 코덱 길이를 달성합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.