QUICK REVIEW

[論文レビュー] Diffusion models as plug-and-play priors

Alexandros Graikos, Nikolay Malkin|arXiv (Cornell University)|Jun 17, 2022

Bayesian Methods and Mixture Models被引用数 42

ひとこと要約

本論文は、独立に訓練された denoising diffusion probabilistic models (DDPMs) が、微分可能な制約の下で推論のプラグアンドプレイ priors として機能し、条件付き生成、画像分割、そして組合せ問題の連続緩和を可能にすることを示している。

ABSTRACT

We consider the problem of inferring high-dimensional data $\mathbf{x}$ in a model that consists of a prior $p(\mathbf{x})$ and an auxiliary differentiable constraint $c(\mathbf{x},\mathbf{y})$ on $x$ given some additional information $\mathbf{y}$. In this paper, the prior is an independently trained denoising diffusion generative model. The auxiliary constraint is expected to have a differentiable form, but can come from diverse sources. The possibility of such inference turns diffusion models into plug-and-play modules, thereby allowing a range of potential applications in adapting models to new domains and tasks, such as conditional generation or image segmentation. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise at each step. Considering many noised versions of $\mathbf{x}$ in evaluation of its fitness is a novel search mechanism that may lead to new algorithms for solving combinatorial optimization problems.

研究の動機と目的

事前学習済みのDDPMが、微分可能な制約 c(x,y) を持つモデルにおける prior p(x) として機能する枠組みを導入する。
DDPMの denoising ネットワークを使用した単一の潜在変数に対する勾配ベースの最適化で推論を行えることを実証する。
条件付き画像生成、分割、組合せ問題の連続緩和への応用を示す。
これらのプラグアンドプレイ用途のために、DDPMの追加の訓練やファインチューニングは不要であることを強調する。

提案手法

自由エネルギー F を介して p(x|y) を正規化定数の上限まで定式化し、 latent η を中心とした delta または Gaussian で q(x) を近似する。
DDPM の forward-noising プロセスを用いて期待値を計算し、再構成の ε-space における二乗誤差損失を導出し、実用的な最適化目的関数（Equation 12）につなぐ。
η（または潜在— latent-cased setups での y）に対して勾配ベースの最適化を実行し、 coarse-to-fine モードを探索するために time steps t をアニーリングする。
制約モデルの jointly training なしに事前学習済み DDPM を活用し、微分可能な制約 c(x,y) を介してオフ・ザ・シェル conditioning を実現する。
適用範囲を広げるために、別のポスターior 近似や潜在空間定式化も議論する。

実験結果

リサーチクエスチョン

RQ1独立に訓練されたDDPMを、 retraining せずに微分可能な制約がある場合のプラグアンドプレイ priors として使用できるか？
RQ2DDPM の潜在またはピクセル空間表現に対する勾配ベースの推論が、視覚タスクにおいて高忠実度の制約付きサンプルを生み出す方法は？
RQ3 conditional generation と segmentation のための有効な conditioning 戦略（例： classifier-based、 weak-labels、 color clustering）は何か？
RQ4 DDPM が latent-variable 推論を通じて traveling salesman problem のような組合せ問題の連続緩和を可能にするか？
RQ5 annealing schedules、初期化選択など、実務的な推論戦略で、分野を超えてロバストな結果を得るものは？

主な発見

事前学習済みのDDPMを、微分可能な制約を満たすサンプルを推論する priors として使用でき、 retraining なしに条件付き生成と分割を可能にする。
DDPM の latent/denoising 空間を、 constraint term log c(x,y) に導かれる最適化により、希望する条件を満たす現実的なサンプルを生成する。
MNIST および CelebA ベースの実験で、DDPM priors によって条件付きの数字や属性付き顔を生成しつつ、画像のリアリズムを維持する。
EnviroAtlas における semantic segmentation で、異なる地理域間のドメイン移動能力を示し、競争力のある精度と IoU を達成する。
連続緩和の TSP 設定において、拡散モデルが潜在的隣接構造を推定し、2-opt の改良後に最適解の小さな割合の中でツアーを得る。離散的な組合せソルバーを用いなくても競争力のある性能を示す。
この方法は完全に画像空間で動作し、特定の符号化スキーム下でサブ線形スケーリングを実現し、構造化推論のために視覚的 priors を活用できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。