Skip to main content
QUICK REVIEW

[论文解读] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Shansan Gong, Mukai Li|arXiv (Cornell University)|Oct 17, 2022
Topic Modeling被引用 94
一句话总结

DiffuSeq 引入一种用于 Seq2Seq 文本生成的无分类器扩散模型,实现并行(非自回归)解码,质量强、多样性显著,并将扩散与自回归及迭代-NAR 框架联系起来。

ABSTRACT

Recently, diffusion models have emerged as a new paradigm for generative models. Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is under-explored due to the discrete nature of texts, especially for conditional generation. We tackle this challenge by proposing DiffuSeq: a diffusion model designed for sequence-to-sequence (Seq2Seq) text generation tasks. Upon extensive evaluation over a wide range of Seq2Seq tasks, we find DiffuSeq achieving comparable or even better performance than six established baselines, including a state-of-the-art model that is based on pre-trained language models. Apart from quality, an intriguing property of DiffuSeq is its high diversity during generation, which is desired in many Seq2Seq tasks. We further include a theoretical analysis revealing the connection between DiffuSeq and autoregressive/non-autoregressive models. Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks. Code is available at \url{https://github.com/Shark-NLP/DiffuSeq}

研究动机与目标

  • 在 Seq2Seq 任务中推动离散、带条件的文本生成的扩散模型研究。
  • 开发一个无分类器的扩散模型,使其在不依赖外部分类器的情况下对源序列进行条件建模。
  • 实现非自回归并行解码,以在保持质量的同时提高多样性。
  • 建立 DiffuSeq 与自回归、迭代-NAR、全-NAR 模型之间的理论联系。
  • 在多项 Seq2Seq 任务上展示经验有效性。

提出的方法

  • 将离散文本对(源与目标)嵌入到一个共享的连续空间,并应用一个部分加噪前向过程,该过程仅扰动目标部分。
  • 用基于 Transformer 的网络建模反向去噪,以学习 pθ(z t−1|z t) ,且不使用辅助分类器(无分类器)。
  • 使用统一的 Emb(wx ⊕ wy) 嵌入来对源表示和目标表示进行联合训练。
  • 推导并最小化变分下界 L_VLB,采用简化目标,强调 y0 重构与嵌入一致性。
  • 对扩散步骤应用重要性采样以稳定训练,并采用 MBR 解码以提升最终质量。
  • 建立与自回归、迭代-NAR 及全-NAR 模型的联系,论证 DiffuSeq 扩展了迭代-NAR。

实验结果

研究问题

  • RQ1扩散模型是否可以在无需分类器的情况下有效适应到条件化的 Seq2Seq 文本生成?
  • RQ2部分加噪前向过程如何影响条件生成以及源 wx 与目标 wy 之间的依赖建模?
  • RQ3DiffuSeq 与自回归、迭代-NAR、全-NAR 模型之间的关系是什么,DiffuSeq 是否在质量和多样性方面提供了优势?
  • RQ4对 wx 和 wy 的共享嵌入的联合训练是否比解耦或预提取的表示有所提升?
  • RQ5基于扩散的 Seq2Seq 模型在标准 Seq2Seq 任务中是否在质量与更强的多样性上具有竞争力?

主要发现

任务方法BLEU ↑R-L ↑分数 ↑dist-1 ↑selfB ↓ / div-4 ↑长度
Open Domain DialogueGRU-attention ⋄0.00680.10540.41280.89980.8008/0.18244.46
Open Domain DialogueTransformer-base ⋄0.01890.10390.47810.74930.3698/0.647219.5
Open Domain DialogueGPT2-base FT ∙0.01080.15080.52790.91940.0182/0.991916.8
Open Domain DialogueGPT2-large FT ∙0.01250.10020.52930.92440.0213/0.993816.8
Open Domain DialogueGPVAE-T5 ∙0.01100.10090.43170.56250.3560/0.555120.1
Open Domain DialogueNAR-LevT ‡0.01580.05500.47600.97260.7103/0.14164.11
Open Domain DialogueDiffuSeq (Ours) ‡0.01390.10560.51310.94670.0144 / 0.997113.6
Question GenerationGRU-attention ⋄0.06510.26170.52220.79300.9999/0.317810.1
Question GenerationTransformer-base ⋄0.16630.34410.63070.93090.3265/0.772010.3
Question GenerationGPT2-base FT ∙0.07410.27140.60520.96020.1403 / 0.921610.0
Question GenerationGPT2-large FT ∙0.11100.32150.63460.96700.2910/0.80629.96
Question GenerationGPVAE-T5 ∙0.12510.33900.63080.93810.3567/0.728211.4
Question GenerationNAR-LevT ‡0.09300.28930.54910.89140.9830/0.47766.93
Question GenerationDiffuSeq (Ours) ‡0.17310.36650.61230.90560.2789 / 0.810311.5
Text SimplificationGRU-attention ⋄0.32560.56020.78710.88830.9998/0.331318.9
Text SimplificationTransformer-base ⋄0.26930.49070.73810.88860.6924/0.509518.5
Text SimplificationGPT2-base FT ∙0.30830.54610.80210.94390.5444/0.604716.1
Text SimplificationGPT2-large FT ∙0.26930.51110.78820.94640.6042/0.587615.4
Text SimplificationGPVAE-T5 ∙0.33920.58280.81660.93080.8147/0.435518.5
Text SimplificationNAR-LevT ‡0.20520.44020.72540.97150.9907/0.32718.31
Text SimplificationDiffuSeq (Ours) ‡0.36220.58490.81260.92640.4642 / 0.660417.7
ParaphraseGRU-attention ⋄0.18940.51290.77630.94230.9958/0.32878.30
ParaphraseTransformer-base ⋄0.27220.57480.83810.97480.4483/0.734511.2
ParaphraseGPT2-base FT ∙0.19800.52120.82460.97980.5480/0.62459.67
ParaphraseGPT2-large FT ∙0.20590.54150.83630.98190.7325/0.50209.53
ParaphraseGPVAE-T5 ∙0.24090.58860.84660.96880.5604/0.61699.60
ParaphraseNAR-LevT ‡0.22680.57950.83440.97900.9995/0.33298.85
ParaphraseDiffuSeq (Ours) ‡0.24130.58800.83650.98070.2732 / 0.864111.2
  • DiffuSeq 在四个 Seq2Seq 任务中达到与六个强基线(包括一个最先进的基于 PLM 的模型)相当或更高的质量。
  • DiffuSeq 一致地获得更高的多样性(自我 BLEU 更低,div-4 更高),同时保持 BLEU、ROUGE 与 BERTScore 的竞争力。
  • 该模型在句子级别上表现出强烈的多样性,当利用多样性时(例如在 MBR 的更大候选集合中)可超越自回归基线。
  • 对 wx 与 wy 的共享嵌入进行联合训练对性能重要;解耦的训练策略会降低结果。
  • DiffuSeq 为自回归、迭代-NAR 与扩散方法之间提供理论与经验上的桥梁,确立扩散作为条件语言生成的可行扩展。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。