QUICK REVIEW

[論文レビュー] Cascaded Diffusion Models for High Fidelity Image Generation

Jonathan Ho, Chitwan Saharia|arXiv (Cornell University)|May 30, 2021

Generative Adversarial Networks and Image Synthesis参考文献 35被引用数 453

ひとこと要約

この論文は連鎖拡散モデルが分類子ガイダンスなしで高忠実度の ImageNet のクラス条件付き生成を実現できることを示し、マルチ解像度カスケードにおける conditioning augmentation によって強力な FID と CAS スコアを達成する。

ABSTRACT

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep, and classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256x256, outperforming VQ-VAE-2.

研究の動機と目的

補助的な分類器なしで、連鎖拡散モデルを用いた高忠実度の ImageNet のクラス条件付き生成を実証する。
連鎖パイプラインにおけるサンプル品質を向上させる conditioning augmentation を提案する。
マルチ解像度カスケードと拡張をサンプリング品質とトレーニング効率に与える影響を分析する。

提案手法

解像度を跨ぐ拡散モデルのパイプラインを構築する（例：32×32 → 64×64 → 128×128/256×256）。
条件付け拡張を用いる：低解像度入力に対する訓練時のガウスノイズと、より高解像度でのオプションのぼかし。
低解像度で基礎の拡散モデルを訓練し、ディテールをアップサンプリングして洗練させる別個の超解像モデルを訓練する。
複数の箇所で conditioning 入力を注入する U-Net ベースのアーキテクチャを採用する。
単純なまたはハイブリッドな損失定式で訓練し、実用的な訓練を維持しつつサンプル品質を最適化する。
s（切り捨て時間）全体にわたって conditioning augmentation をアモルタイズして、訓練後のハイパーパラメータ探索を可能にする。

実験結果

リサーチクエスチョン

RQ1分類器ガイダンスなしで、連鎖拡散パイプラインは ImageNet において競争力のあるまたは優れたサンプル品質を達成できるか？
RQ2conditioning augmentation は連鎖拡散モデルの品質と安定性にどのような影響を与えるか？
RQ3異なる解像度と切り捨て戦略がFIDとCAS指標に与える効果は何か？
RQ4 conditioning augmentation 手法は ImageNet を超えて LSUN のような他のデータセットにも一般化するか？

主な発見

Model	Resolution	FID (train)	FID (validation)	IS	Top-1 CAS	Top-5 CAS
CDM (ours)	32×32	1.11	1.99	26.01 ± 0.59
CDM (ours)	64×64	1.48	2.48	67.95 ± 1.97
CDM (ours)	128×128	3.52	3.76	128.80 ± 2.51	59.84%	81.79%
CDM (ours)	256×256	4.88	4.63	158.71 ± 2.26	63.02%	84.06%

CDM は FIDスコアを 1.48 (64×64), 3.52 (128×128), 4.88 (256×256) に達し、これらの解像度で BigGAN-deep を上回る。
CAS スコアは 256×256 で 63.02%（Top-1）と 84.06%（Top-5）に達し、VQ-VAE-2 および BigGAN-deep を上回る。
条件付け拡張は連鎖パイプラインで高忠実なサンプルを得るために重要で、累積誤差や露出バイアスを緩和する。
適切な augmentation を伴う 32×32 基盤、32×32 → 64×64 SR、そして 64×64 → 128×128/256×256 SR の 2 段階カスケードは、複数の解像度で ImageNet における最先端の分類子なし結果をもたらす。
非切り捨てと切り捨て条件付け拡張は実質的に同様の効果を示し、拡張強度のハイパーパラメータ探索を実用的に可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。