QUICK REVIEW

[论文解读] Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abhimanyu Hans, Avi Schwarzschild|arXiv (Cornell University)|Jan 22, 2024

Natural Language Processing Techniques被引用 16

一句话总结

Binoculars 使用基于对比两种密切相关的 LLM 的零-shot 检测器来计算 Binoculars 得分，使得在无需训练数据的情况下实现机器生成文本的模型无关检测。它在跨领域实现了最先进的性能，在极低的假阳性率下超越了若干基线。

ABSTRACT

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

研究动机与目标

推动并开发一个用于机器生成文本的零样本、模型无关的检测器。
提出一个简单的、无需训练的分数，利用两个语言模型来区分人工文本与机器文本。
在不同数据集、语言和模型族中评估鲁棒性。
在域外设置中，与开源和商业检测器进行比较。

提出的方法

将 log-perplexity 定义为一个模型在文本上标记的平均负对数似然。
在共用分词器的前提下，定义同一文本上两个模型之间的 log-cross-perplexity。
提出 Binoculars 得分 B = logPPL_M1(s) / logX-PPL_M1,M2 (s).
采用 Falcon-7b (M1) 和 Falcon-7b-instruct (M2) 作为评分的主要模型对。
在包括 News、Creative Writing、Student Essays、CCNews、CNN、PubMed 与 Orca 派生提示等多样数据集上评估零-shot 检测。
在固定假阳性率下报告真实阳性率（TPR at FPR = 0.01%）。

实验结果

研究问题

RQ1零-shot 检测器在不使用目标 LLM 的训练数据的情况下，是否能区分人类文本与机器文本？
RQ2对比两种密切相关的 LLM 是否能产生一个鲁棒的、模型无关的跨域与跨语种检测器？
RQ3与现有检测器相比，Binoculars 在域外和多语言设置中的性能如何？
RQ4在提示、提示修改和随机文本情境下，Binoculars 的可靠性与边缘情况表现如何？
RQ5文档长度对检测性能有何影响？

主要发现

Binoculars 在跨域中实现高准确性，对于 ChatGPT 输出，在 0.01% 的假阳性率下真实阳性率超过 90%。
检测器在多样数据集和域外设置中仍然有效，在无监督条件下超越了 Ghostbuster、GPTZero 等基线。
随着上下文/标记数量的增加，性能提升，Binoculars 能泛化到由 LLaMA- 和 Falcon 生成的文本。
在多语言和低资源语言中，Binoculars 保持较低的假阳性，但召回率下降，符合这些语言中的模型限制。
该方法对修改提示策略和记忆化具有鲁棒性，尽管仍存在诸如记忆化文本等边缘情况。
该方法无需任何针对特定模型的训练数据，也不需要针对 ChatGPT 进行调优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。