QUICK REVIEW

[论文解读] Unsupervised and Distributional Detection of Machine-Generated Text

Matthias Gallé, Jos Rozen|arXiv (Cornell University)|Nov 4, 2021

Topic Modeling被引用 10

一句话总结

本文提出一种无监督、基于分布的文本检测方法，通过识别大规模文本集合中过度频繁的高阶n-gram（特别是超极大重复）来检测机器生成的文本。利用这些重复短语作为弱信号，该方法采用自训练策略，结合集成分类器对可疑文档进行排序，针对GPT2-large模型在top-k采样下实现超过90%的精确率（m=5,000），在nucleus采样下实现超过80%的精确率。

ABSTRACT

The power of natural language generation models has provoked a flurry of interest in automatic methods to detect if a piece of text is human or machine-authored. The problem so far has been framed in a standard supervised way and consists in training a classifier on annotated data to predict the origin of one given new document. In this paper, we frame the problem in an unsupervised and distributional way: we assume that we have access to a large collection of unannotated documents, a big fraction of which is machine-generated. We propose a method to detect those machine-generated documents leveraging repeated higher-order n-grams, which we show over-appear in machine-generated text as compared to human ones. That weak signal is the starting point of a self-training setting where pseudo-labelled documents are used to train an ensemble of classifiers. Our experiments show that leveraging that signal allows us to rank suspicious documents accurately. Precision at 5000 is over 90% for top-k sampling strategies, and over 80% for nucleus sampling for the largest model we used (GPT2-large). The drop with increased size of model is small, which could indicate that the results hold for other current and future large language models.

研究动机与目标

解决依赖标注数据且对分布偏移敏感的监督检测方法的局限性。
在大量文本可能为合成文本的语料库中检测机器生成文本，无需预先标注。
识别分布模式——特别是重复的高阶n-gram——以区分机器生成与人工撰写文本。
开发一种可扩展的自训练框架，利用伪标签数据提升检测性能，而无需人工标注的训练集。
评估不同解码策略（top-k与nucleus）及模型规模下的方法性能，以检验其鲁棒性与泛化能力。

提出的方法

在大规模未标注文档语料库中检测超极大重复——即不被更长重复包含的最长重复子串。
将此类重复的存在作为弱信号，用于识别可能为机器生成的文档，假设其在模型生成文本中出现频率更高。
应用自训练流水线：将具有显著重复的文档标记为机器生成（伪标签化），然后在这些伪标签数据上训练集成分类器。
在伪标签数据上训练二分类器（微调的蒸馏BERT），以真实人工生成文本为正样本，以重复丰富的文档为负样本。
通过多个分类器的多数投票机制提升鲁棒性与排序性能。
使用精确率@m（prec@m）评估性能，衡量前m名排序文档中真正为机器生成文本的比例。

实验结果

研究问题

RQ1过度频繁的高阶n-gram能否作为可靠、无监督的信号，在大规模未标注语料库中检测机器生成文本？
RQ2基于重复频率的伪标签文档进行自训练的方法，在提升检测性能方面效果如何？
RQ3该方法是否能在不同解码策略（top-k与nucleus）及模型规模（small、medium、large GPT2）下实现泛化？
RQ4当生成模型或解码策略改变时，性能下降程度如何，相较于监督基线模型？
RQ5当n-gram级统计特征与人工文本无法区分时，该方法是否仍能检测出机器生成内容？

主要发现

超极大重复在机器生成文本中显著多于人工撰写文本，构成可靠的弱信号用于检测。
利用该信号进行伪标签化，可实现自训练流水线，在GPT2-large模型的top-k采样下，精确率超过90%（m=5,000）。
在nucleus采样下，相同模型的精确率也超过80%（m=5,000），表明该方法在不同解码策略下具有强鲁棒性。
监督与半监督设置之间的性能差距显著，表明该方法能有效利用弱信号逼近监督性能。
即使生成模型或解码策略发生变化，该方法仍保持有效，表明其对当前及未来大语言模型具备泛化能力。
检测准确率与生成文本的多样性相关：top-k采样因输出更富多样性，其检测性能高于nucleus采样。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。