QUICK REVIEW

[论文解读] Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Yilun Du, Conor Durkan|arXiv (Cornell University)|Feb 22, 2023

Generative Adversarial Networks and Image Synthesis被引用 14

一句话总结

本文展示了如何将扩散模型与 MCMC 采样和基于能量的参数化相结合，以在图像和文本到图像任务中实现准确的组合法生成，在若干场景中优于标准的反向扩散采样。

ABSTRACT

Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying sequence of log-probability density functions. This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance. In particular, we investigate why certain types of composition fail using current techniques and present a number of solutions. We conclude that the sampler (not the model) is responsible for this failure and propose new samplers, inspired by MCMC, which enable successful compositional generation. Further, we propose an energy-based parameterization of diffusion models which enables the use of new compositional operators and more sophisticated, Metropolis-corrected samplers. Intriguingly we find these samplers lead to notable improvements in compositional generation across a wide set of problems such as classifier-guided ImageNet modeling and compositional text-to-image generation.

研究动机与目标

阐明为何简单的扩散模型组合会失败，并将采样器的局限性确认为根本原因。
提出基于 MCMC 的采样与基于能量的参数化，以在无需重新训练的情况下实现正确的组合法生成。
展示在二维、类似 CLEVR 的形状、ImageNet 分类器引导以及文本到图像合成等方面的改进的组合法生成。
展示基于能量的扩 diffusion 如何实现更灵活、精细的组合法运算符，如乘积、混合和否定。

提出的方法

从分数基/去噪分数匹配的视角框定扩散模型，并讨论通过贝叶斯规则与引导尺度进行条件引导。
介绍并分析组合法算子：乘积、混合和否定，并展示为何天真基于分数的组合可能失败。
提出退火型 MCMC 采样器（ULA、HMC 变体）以从组合分布中采样，包括带 metropolis 校正的变体（MALA、类似 MALA、HMC）。
采用基于能量的参数化 f_theta(x,t)，使 epsilon_theta(x,t) = -∇x f_theta(x,t) 以获得显式的未归一化对数密度，从而实现 Metropolis 调整和更丰富的组合法。
展示基于能量的参数化如何使以 MCMC 的采样忠实地从组合分布中采样。
将该方法应用于二维密度、类似 CLEVR 的立方体放置、ImageNet 分类器引导生成，以及文本到图像的组合法生成，包括织锦风格的多尺度内容。

Figure 1 : Creating new models through composition. Simple operators enable diffusion models to be composed without retraining in settings such (a) products, (b) classifier conditioning, (c) compositional text-to-image generation with products and mixtures, (d) image tapestries with different conten

实验结果

研究问题

RQ1是否可以无需重新训练，使用标准反向扩散实现对扩散模型组合的样本正确性？
RQ2基于 MCMC 的采样（ULA、HMC，是否带 Metropolis 校正）是否能够产生忠实于组合法分布的样本，基于能量的参数化又如何影响这一点？
RQ3在不同领域（二维、三维样式形状、ImageNet、文本到图像）应用乘积、混合和否定等组合法算子时，样本质量与保真度的实际提升有哪些？
RQ4相较于基于分数的参数化，基于能量的扩散模型如何使采样器和组合法变得更为复杂？

主要发现

天真地使用反向扩散采样在扩散模型集合中无法忠实实现组合分布（乘积/混合）。
退火型 MCMC 采样（ULA、HMC）改善了来自组合模型的样本，Metropolis 调整带来进一步提升。
使得显式对数密度成为可能的基于能量的参数化支持有效的 Metropolis 校正采样器（MALA、HMC），并在组合法任务中带来显著改进。
在二维密度、类似 CLEVR 的立方体条件、ImageNet 分类器引导生成，以及文本到图像组合法中，基于能量参数化的 MCMC 采样实现了更高的保真度和更好的定量指标（如基于 RAISE/LL/MMD 的评估；分类引导的 ImageNet 中 Inception Score 和 FID 的改进）。
该方法实现文本到图像的组合法和图像织锦生成，结果表明采样方法（不仅仅是模型）驱动了组合法的成功。

Figure 2 : An illustration of product and mixture compositional models, and the improved sampling performance of MCMC in both cases. Left to right: Component distributions, ground truth composed distribution, reverse diffusion samples, HMC samples. Top: product, bottom: mixture. Reverse diffusion fa

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。