QUICK REVIEW

[논문 리뷰] Learning to Efficiently Sample from Diffusion Probabilistic Models

Daniel Watson, Jonathan Ho|arXiv (Cornell University)|2021. 06. 07.

Gaussian Processes and Bayesian Inference참고 문헌 26인용 수 49

한 줄 요약

다이나믹 프로그래밍 접근법을 활용하여 사전 학습된 DDPM에 대한 최적 추론 스케줄을 찾고, 재학습 없이도 32회의 개선 단계로도 고품질 샘플링이 가능하도록 한다. ELBO를 고정된 계산 예산 하에 타임스텝을 선택하도록 최적화한다.

ABSTRACT

Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a powerful family of generative models that can yield high-fidelity samples and competitive log-likelihoods across a range of domains, including image and speech synthesis. Key advantages of DDPMs include ease of training, in contrast to generative adversarial networks, and speed of generation, in contrast to autoregressive models. However, DDPMs typically require hundreds-to-thousands of steps to generate a high fidelity sample, making them prohibitively expensive for high dimensional problems. Fortunately, DDPMs allow trading generation speed for sample quality through adjusting the number of refinement steps as a post process. Prior work has been successful in improving generation speed through handcrafting the time schedule by trial and error. We instead view the selection of the inference time schedules as an optimization problem, and introduce an exact dynamic programming algorithm that finds the optimal discrete time schedules for any pre-trained DDPM. Our method exploits the fact that ELBO can be decomposed into separate KL terms, and given any computation budget, discovers the time schedule that maximizes the training ELBO exactly. Our method is efficient, has no hyper-parameters of its own, and can be applied to any pre-trained DDPM with no retraining. We discover inference time schedules requiring as few as 32 refinement steps, while sacrificing less than 0.1 bits per dimension compared to the default 4,000 steps used on ImageNet 64x64 [Ho et al., 2020; Nichol and Dhariwal, 2021].

연구 동기 및 목표

DDPM에서 샘플링의 계산 비용을 재학습 없이 줄이는 것을 목표로 한다.
주어진 개선 예산 아래 최적의 추론 타임스텝을 선택하는 정확한 다이나믹 프로그래밍 방법을 도입한다.
메모이제이션을 활용하고 추론 경로에 대해 정확한 최적화를 가능하게 하기 위해 ELBO 분해를 활용한다.

제안 방법

ELBO 분해를 이용한 타임스텝의 최단 경로 문제로 추론 스케줄 선택을 형식화한다.
고정된 사전 학습 DDPM을 사용하여 후보 타임스텝 전반에 걸친 KL 기반 ELBO 항목 L(t,s)의 표를 계산한다.
정확한 최적 경로를 정확히 찾기 위해 정확한 K개의 개선 단계(0=t0<...<tK=1)로 최단 경로 알고리즘을 적용한다.
L(t,s) 항목을 채우기 위해 O(T) 프런트 패스만 필요하도록 메모이제이션을 활용한다(T는 그리드 타임스텝의 수).
0에서 시작하고 1에서 끝나는 연속적인 타임스텝으로 구성된 유효한 ELBO 경로를 만들어 시간-이산 및 시간-연속 DDPM 모두를 지원한다.
계산을 줄이기 위해 몬테카를로 샘플링으로 ELBO 항목을 추정하는 것을 선택적으로 수행한다.

실험 결과

연구 질문

RQ1DDPM에서의 추론 스케줄 선택이 재학습 없이도 고정된 계산 예산 하에서 최적화될 수 있는가?
RQ2DDPM의 타임스텝에 대한 정확한 다이나믹 프로그래밍 형식이 소수 단계에서 핸드크래프트 샘플링 스케줄보다 더 높은 ELBO(더 낮은 음의 ELBO)를 제공하는가?
RQ3원래 모델의 로그 가능도에 근접하면서 계산을 크게 줄이려면 몇 개의 개선 단계가 필요한가?
RQ4DP에서 유도된 스케줄이 재학습 없이도 사전 학습된 DDPM 변형(시간-이산 및 시간-연속) 간에 전달 가능한가?

주요 결과

모델 ∖ # 정제 단계	8	16	32	64	128	256	전체
DistAug Transformer (Jun et al., 2020)	–	–	–	–	–	–	2.53
DDPM++ (deep, sub-VP) (Song et al., 2021)	–	–	–	–	–	–	2.99
L_simple (Even stride)	6.95	6.15	5.46	4.91	4.47	4.14	3.73
L_simple (Quadratic stride)	5.39	4.86	4.52	3.84	3.74	3.73	–
L_simple (DP stride)	4.59	3.99	3.79	3.74	3.73	3.72	–
L_vlb (Even stride)	6.20	5.48	4.89	4.42	4.03	3.73	2.94
L_vlb (Quadratic stride)	4.89	4.09	3.58	3.23	3.09	3.05	–
L_vlb (DP stride)	4.20	3.41	3.17	3.08	3.05	3.04	–
L_hybrid (Even stride)	6.14	5.39	4.77	4.29	3.92	3.66	3.17
L_hybrid (Quadratic stride)	4.91	4.15	3.71	3.42	3.30	3.26	–
L_hybrid (DP stride)	4.33	3.62	3.39	3.30	3.27	3.26	–

DP 기반 방법은 예산 K에 대해 최적의 추론 경로를 찾고 ELBO 항목을 계산하기 위해 O(T) 프런트 패스만 필요하다.
CIFAR-10에서 L_simple과 ImageNet 64x64에서 L_hybrid에서 32개 개선 단계만으로도 원본 1000–4000단계 모델의 0.1 비트/차원 이내의 성능을 달성한다.
DP-스트라이드 스케줄은 소수 단계 regime에서 로그 가능도(bits/dim) 기준으로 핸드크래프트의 등간/제곱 증가보다 더 우수하다.
DP 접근법은 강한 로그 가능도를 제공하지만 FID 점수를 항상 향상시키지는 못하며, 가능도 기반 지표와 FID 사이의 알려진 괴리를 강조한다.
몬테카를로 샘플링으로 ELBO 항목을 추정하는 128개 샘플 정도로도 CIFAR-10에서 상당한 이점을 얻을 수 있으며, ImageNet은 더 많은 샘플에서 이점을 보인다.
이 방법은 학습 없이도 작동하며 사전 학습된 DDPM에 광범위하게 적용 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.