QUICK REVIEW

[论文解读] SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers

Hongyi Yuan, Zheng Yuan|arXiv (Cornell University)|Dec 20, 2022

Music and Audio Processing被引用 20

一句话总结

SeqDiffuSeq 将扩散模型扩展到使用编码器-解码器 Transformer 的序列到序列文本生成，增强了自条件化和逐 token 自适应噪声调度，在多项任务中实现与 DiffuSeq 相当的质量并且推理更快。

ABSTRACT

Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to natural language, and text diffusion models are less studied. Sequence-to-sequence text generation is one of the essential natural language processing topics. In this work, we apply diffusion models to approach sequence-to-sequence text generation, and explore whether the superiority generation performance of diffusion model can transfer to natural language domain. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. SeqDiffuSeq uses an encoder-decoder Transformers architecture to model denoising function. In order to improve generation quality, SeqDiffuSeq combines the self-conditioning technique and a newly proposed adaptive noise schedule technique. The adaptive noise schedule has the difficulty of denoising evenly distributed across time steps, and considers exclusive noise schedules for tokens at different positional order. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.

研究动机与目标

将连续扩散扩展到带编码器-解码器 Transformer 架构的序列到序列文本生成。
通过自条件化和逐 token 自适应噪声调度来提高生成质量。
在多个任务上展示与 AR/NAR 基线及 DiffuSeq 的竞争性能。
展示推理速度的提升并分析所提技术的影响。

提出的方法

将输出序列标记的前向扩散建模为进入连续嵌入，与输入无关，使用来自 DiffusionLM 的参数的高斯扩散步骤。
在去噪函数中使用编码器-解码器 Transformer，其中编码器处理输入序列，解码器对带时间步条件的嘈杂输出序列进行建模。
通过将先前的去噪输出输入到当前去噪步骤中来实现自条件化，以重用早期预测中的信息。
引入逐 token 的自适应噪声调度，通过线性插值将按 token 的去噪难度（以训练损失衡量）映射到时间步噪声水平。
使用变分界的目标进行训练；推导一个简单的损失，既鼓励去噪预测恢复原始嵌入，又促进解码的准确性。
在推理阶段探索基于 MBR 的解码以提升生成质量，并分析与多样性之间的权衡。

实验结果

研究问题

RQ1基于扩散的带编码器-解码器结构的序列到序列模型是否能够在文本生成任务中达到与 AR 和 NAR 基线相竞争的质量？
RQ2自条件化是否能够改进在扩散式文本生成过程中的先前预测的使用？
RQ3逐 token 的自适应噪声调度是否比固定调度在去噪难度对齐和生成质量方面表现更佳？
RQ4在多任务中，SeqDiffuSeq 在速度和多样性方面与 DiffuSeq 及其他基线相比如何？

主要发现

SeqDiffuSeq 在五个序列到序列任务中，与 AR 和 NAR 基线相比，生成质量和多样性具有竞争力。
自条件化和自适应噪声调度都提升了性能，且彼此互补。
SeqDiffuSeq 由于编码器重用与序列级去噪，推理速度比 DiffuSeq 要快得多，显著减少运行时。
通过 MBR 推理，SeqDiffuSeq 在若干任务上可以进一步提高质量，尽管在多样性方面存在权衡。
在翻译任务中，SeqDiffuSeq 通常落后于自回归 Transformer，但超过了若干非自回归方法，同时相对于 DiffuSeq 表现更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。