QUICK REVIEW

[论文解读] The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys|arXiv (Cornell University)|Apr 22, 2019

Topic Modeling参考文献 40被引用 1,100

一句话总结

本文分析开放式文本生成的解码策略，并引入 Nucleus Sampling，它截断分布的不可置信尾部，以比现有方法产生更高质量和更具多样性的文本。

ABSTRACT

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive. In this paper, we reveal surprising distributional differences between human text and machine text. In addition, we find that decoding strategies alone can dramatically effect the quality of machine text, even when generated from exactly the same neural language model. Our findings motivate Nucleus Sampling, a simple but effective method to draw the best out of neural generation. By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

研究动机与目标

揭示开放式生成中的神经文本退化。
在分布性、困惑度和人类评估标准上比较解码策略。
提出并验证 Nucleus Sampling 作为长文本的首选解码方法。
提供在何时及为何在替代方法中使用 nucleus sampling 的实用指南。

提出的方法

将 top-p（nucleus）词汇定义为其累计概率达到 p 的最小集合。
对 nucleus 上的分布重新归一化并从其采样。
使用分布式度量和人类评估（HUSE）将 nucleus sampling 与 top-k、温度、束搜索和纯采样进行比较。
在 GPT-2 Large (762M) Generatively Pre-trained Transformer 及 WebText 数据上进行评估。
分析困惑度、Zipf 系数、Self-BLEU、重复率和 HUSE 以评估质量与多样性。

实验结果

研究问题

RQ1基于最大化的解码（如束搜索）是否会在开放式生成中产生退化、重复的文本？
RQ2从模型分布的截断尾部采样（nucleus sampling）是否会产生既高质量又多样的文本？
RQ3不同的解码策略在分布性、统计性和人类评估标准下与人类文本相比如何？

主要发现

方法	困惑度	Self-BLEU	Zipf 系数	重复率 %	HUSE
Human	12.38	0.31	0.93	0.28	-
Greedy	1.50	0.50	1.00	73.66	-
Beam, b=16	1.48	0.44	0.94	28.94	-
Stochastic Beam, b=16	19.20	0.28	0.91	0.32	-
Pure Sampling	22.73	0.28	0.93	0.22	0.67
Sampling, t=0.9	10.25	0.35	0.96	0.66	0.79
Top-k=40	6.88	0.39	0.96	0.78	0.19
Top-k=640	13.82	0.32	0.96	0.28	0.94
Top-k=40, t=0.7	3.48	0.44	1.00	8.86	0.08
Nucleus p=0.95	13.13	0.32	0.95	0.36	0.97

基于最大化的解码在开放式生成中常产生重复或通用的文本。
模型的尾部分布不可靠，生成时应截断。
Nucleus Sampling 在困惑度和多样性方面与人类相近，并在 HUSE 评估下实现了最高的综合质量-多样性权衡。
Nucleus Sampling 在 Zipf 和多样性指标上接近人类分布特征，同时避免重复。
Top-k 采样和温度存在依上下文而定的缺点，而纯采样可能不连贯。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。