QUICK REVIEW

[论文解读] Structured Denoising Diffusion Models in Discrete State-Spaces

Jacob Austin|arXiv (Cornell University)|Jul 7, 2021

Generative Adversarial Networks and Image Synthesis参考文献 51被引用 82

一句话总结

本文介绍了离散去噪扩散概率模型（D3PMs），它对离散数据应用类似扩散的破坏，使用结构化的转移矩阵；通过新的辅助损失改进训练，并在文本和图像任务上取得了优异结果。

ABSTRACT

Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.

研究动机与目标

以扩散概念为基础，推动对离散数据（文本与图像）的更优生成建模。
将离散扩散推广到不仅限于均匀破坏，而是具有结构化转化的扩散过程。
开发可学习的逆过程及稳定化的辅助损失以提升性能。
展示对大词汇表、长序列的文本数据以及图像数据的可扩展性。
与非自回归基线进行比较，展示具有竞争力的对数似然值和样本质量。

提出的方法

为离散的 K 类变量定义一个通用扩散框架，前向转移 q(x_t|x_{t-1}) 用 Q_t 矩阵表示。
使用归一化的前向过程，其中 q(x_t|x_0) = Cat(x_t; p = x_0 Q̄_t)，其中 Q̄_t = Q_1 Q_2 ... Q_t。
用以 x_0 为条件的对数概率参数化逆过程 p_θ(x_{t-1}|x_t)，以与 q(x_{t-1}|x_t,x_0) 对齐并保持由 Q_t 规定的稀疏性。
引入损失 L_λ = L_vb + λ E_q,E_q[-log p̃_θ(x_0|x_t)]，一种辅助去噪目标，鼓励在每一步准确预测 x_0。
探索结构化前向矩阵（均匀、吸收/掩码、离散化高斯、基于嵌入的相似性）及其相应的噪声安排。
展示 x_0 参数化，包含如截断的离散化逻辑斯蒂分布用于序数数据，以及 k 步推断等选项。

实验结果

研究问题

RQ1具有结构化破坏的离散扩散模型能否在文本和图像任务上超越先前的离散扩散方法？
RQ2不同前向转移矩阵（均匀、吸收/掩码、离散化高斯、基于嵌入的）如何影响样本质量和对数似然？
RQ3辅助损失 L_λ 是否在跨领域提升训练稳定性与生成质量？
RQ4D3PM 在文本中的大词汇表与长序列以及标准图像数据集上的可扩展性如何？
RQ5D3PM 与自回归模型或掩码语言模型之间存在哪些联系？

主要发现

带有吸收（MASK）转换的 D3PM 在 text8 上实现了强文本生成结果，优于均匀和 NN 变体。
在 LM1B 上，D3PM 吸收方法可扩展到大词汇表，并在相对较少的推理步骤下显示出有竞争力的困惑度。
在 CIFAR-10 上，D3PM Gauss（离散化高斯）配合 L_vb 目标在测试的变体中给出最佳的 IS、FID 和 NLL，当与基于截断的逆建模结合时，L_λ 进一步提升性能。
D3PM 吸收模型在文本方面显示出强劲结果，结合 L_λ 损失，说明辅助去噪目标的好处。
在文本方面，D3PM 吸收方法可扩展到 8k 词汇量和 128 长度序列，在某些设置下接近自回归模型，并提供更快的采样速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。