QUICK REVIEW

[論文レビュー] Learning to Efficiently Sample from Diffusion Probabilistic Models

Daniel Watson, Jonathan Ho|arXiv (Cornell University)|Jun 7, 2021

Gaussian Processes and Bayesian Inference参考文献 26被引用数 49

ひとこと要約

事前訓練済みの DDPM に対する最適推論スケジュールを見つけるための動的計画法アプローチを提示し、再訓練なしで 32 回のリファインメントステップでも高品質なサンプリングを実現します。 ELBO を最適化して固定の計算予算の下でタイムステップを選択します。

ABSTRACT

Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a powerful family of generative models that can yield high-fidelity samples and competitive log-likelihoods across a range of domains, including image and speech synthesis. Key advantages of DDPMs include ease of training, in contrast to generative adversarial networks, and speed of generation, in contrast to autoregressive models. However, DDPMs typically require hundreds-to-thousands of steps to generate a high fidelity sample, making them prohibitively expensive for high dimensional problems. Fortunately, DDPMs allow trading generation speed for sample quality through adjusting the number of refinement steps as a post process. Prior work has been successful in improving generation speed through handcrafting the time schedule by trial and error. We instead view the selection of the inference time schedules as an optimization problem, and introduce an exact dynamic programming algorithm that finds the optimal discrete time schedules for any pre-trained DDPM. Our method exploits the fact that ELBO can be decomposed into separate KL terms, and given any computation budget, discovers the time schedule that maximizes the training ELBO exactly. Our method is efficient, has no hyper-parameters of its own, and can be applied to any pre-trained DDPM with no retraining. We discover inference time schedules requiring as few as 32 refinement steps, while sacrificing less than 0.1 bits per dimension compared to the default 4,000 steps used on ImageNet 64x64 [Ho et al., 2020; Nichol and Dhariwal, 2021].

研究の動機と目的

DDPM からのサンプリング計算コストを retraining せずに削減する動機付け。
与えられた refinement budget の下で最適な推論タイムステップを正確に選択する動的計画法を導入する。
ELBO の分解性を利用してメモ化と推論経路の正確最適化を可能にする。

提案手法

ELBO の分解を用いた最短経路問題として推論スケジュールの選択を定式化する。
固定された事前訓練済み DDPM を用いてすべての候補タイムステップに対する KL ベースの ELBO 項 L(t,s) の表を計算する。
正確な最適経路をちょうど K 回の refinement steps（0=t0<...<tK=1）で見つける動的計画法アルゴリズムを適用する。
L(t,s) 項を埋めるための前方伝搬を O(T) 回に抑えるようメモ化を活用する。ここで T は格子タイムステップの数。
0 から始まり 1 に終わる連続したタイムステップを持つ有効な ELBO 経路を構築することで、時刻離散・時刻連続の DDPM の両方をサポートする。
計算を抑えるために Monte Carlo サンプリングで ELBO 項を推定するオプションも提供。

実験結果

リサーチクエスチョン

RQ1DDPM における推論スケジュールの選択を、再訓練なしに固定の計算予算の下で最適化できるか。
RQ2DDPM のタイムステップに対する正確な動的計画法の定式化が、少数ステップ領域で手作りサンプリングスケジュールより高い ELBO（負の ELBO が低い）をもたらすか。
RQ3元のモデルの対数尤度と大幅な計算削減をどれだけの refinement steps で closely match できるか。
RQ4DP によって導出されたスケジュールは retraining せずとも、事前訓練済み DDPM のバリアント（時刻離散・時刻連続）間で転用可能か。

主な発見

Model ∖ # refinement steps	8	16	32	64	128	256	All
DistAug Transformer (Jun et al., 2020)	–	–	–	–	–	–	2.53
DDPM++ (deep, sub-VP) (Song et al., 2021)	–	–	–	–	–	–	2.99
L_simple (Even stride)	6.95	6.15	5.46	4.91	4.47	4.14	3.73
L_simple (Quadratic stride)	5.39	4.86	4.52	3.84	3.74	3.73	–
L_simple (DP stride)	4.59	3.99	3.79	3.74	3.73	3.72	–
L_vlb (Even stride)	6.20	5.48	4.89	4.42	4.03	3.73	2.94
L_vlb (Quadratic stride)	4.89	4.09	3.58	3.23	3.09	3.05	–
L_vlb (DP stride)	4.20	3.41	3.17	3.08	3.05	3.04	–
L_hybrid (Even stride)	6.14	5.39	4.77	4.29	3.92	3.66	3.17
L_hybrid (Quadratic stride)	4.91	4.15	3.71	3.42	3.30	3.26	–
L_hybrid (DP stride)	4.33	3.62	3.39	3.30	3.27	3.26	–

DP ベースの手法は、任意の予算 K に対して最適な推論経路を見つけ、ELBO 項を計算するために O(T) 回の前方伝搬のみを必要とする。
CIFAR-10 の L_simple および ImageNet 64x64 の L_hybrid で、32 回程度の refinement ステップでも元の 1000–4000 ステップモデルと同等の性能（1 次元あたりの 0.1 ビット以内）を達成。
DP-ストライドスケジュールは、少ステップ領域において手作りの等間隔または二次的ストライドより対数尤度（ビット/次元）で上回る。
DP アプローチは強力な対数尤度を示すが、必ずしも FID スコアを改善するとは限らず、尤度指標と FID の間には既知の非整合性があることを強調。
モンテカルロアブレーションでは、ELBO 項を推定するのに 128 サンプル程度を使用しても CIFAR-10 で顕著な改善が得られ、ImageNet はより多くのサンプルで利益を得る。
この方法は学習を必要とせず、事前訓練済み DDPM に広く適用可能である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。