Skip to main content
QUICK REVIEW

[论文解读] Large Language Models can be Guided to Evade AI-Generated Text Detection

Ning Lü, Shengcai Liu|arXiv (Cornell University)|May 18, 2023
Topic Modeling被引用 19
一句话总结

论文介绍了 SICO,一种基于替换的上下文学习方法,它构建任务特定的提示,使大语言模型能够在多个任务中规避多种检测器,成本低且适用范围广。

ABSTRACT

Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public. However, the increasing concerns regarding the misuse of LLMs, such as plagiarism and spamming, have led to the development of multiple detectors, including fine-tuned classifiers and statistical methods. In this study, we equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors. We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically construct prompts for evading the detectors. SICO is cost-efficient as it requires only 40 human-written examples and a limited number of LLM inferences to generate a prompt. Moreover, once a task-specific prompt has been constructed, it can be universally used against a wide range of detectors. Extensive experiments across three real-world tasks demonstrate that SICO significantly outperforms the paraphraser baselines and enables GPT-3.5 to successfully evade six detectors, decreasing their AUC by 0.5 on average. Furthermore, a comprehensive human evaluation show that the SICO-generated text achieves human-level readability and task completion rates, while preserving high imperceptibility. Finally, we propose an ensemble approach to enhance the robustness of detectors against SICO attack. The code is publicly available at https://github.com/ColinLu50/Evade-GPT-Detector.

研究动机与目标

  • 评估 AI 生成文本检测器对提示引导规避的鲁棒性。
  • 开发一种低成本的方法自动构建可降低检测器 AUC 的提示。
  • 在三个真实世界任务和检测器中展示 SICO 的有效性。
  • 评估 SICO 生成文本的人类可读性和在真实世界中的适用性。

提出的方法

  • 定义一个提示效用函数以最大化检测器的规避 (U(p))。
  • 收集一个 AI 写作与人类写作输出的数据集 D 以提取语言特征。
  • 迭代地替换上下文演示中的单词和句子以优化提示(GreedyOPT)。
  • 使用基于 WordNet 的词级替换和以代理检测器引导的释义级句子替换。
  • 构建任务提示 p*,并通过效用比较选择最佳者。
  • 提供 SICO-Gen(直接生成)和 SICO-Para(改述)变体。

实验结果

研究问题

  • RQ1提示引导的上下文学习能否在规避检测器方面胜过外部改述工具?
  • RQ2SICO 在不同检测器和任务中的成本、鲁棒性和通用性如何?
  • RQ3人类评估者是否认为 SICO 生成的文本可读且具备目标导向性?
  • RQ4SICO 在真实世界场景(如 Reddit)中的表现如何?

主要发现

数据集方法GPT3-D*GPT2-DGPTzeroOpenAI-DDetectGPTLog-Rank
写作Parrot0.6660.6450.6320.7440.5020.577
写作DIPPER0.7360.9070.6890.7500.5500.684
写作GPT-Para0.8790.6230.6310.6900.5690.713
写作人工提示0.8520.5600.4910.6550.6760.759
写作SICO-Para0.2390.3320.2900.4880.1490.147
写作SICO-Gen0.2420.0990.1840.3110.4410.318
问答Parrot0.9220.8370.8490.6980.6890.806
问答DIPPER0.8880.9620.8690.7220.6040.782
问答GPT-Para0.9560.7970.8110.6990.6400.782
问答人工提示0.9120.6250.7910.6560.6620.757
问答SICO-Para0.4070.5760.5720.5410.1780.183
问答SICO-Gen0.6680.4890.4940.5240.4970.535
评审Parrot0.8710.9340.9130.8820.6540.893
评审DIPPER0.8750.9840.8880.8240.5150.814
评审GPT-Para0.8990.8510.8330.9250.5420.864
评审人工提示0.8390.6100.8560.8580.6190.851
评审SICO-Para0.4650.2640.5990.5400.2700.300
评审SICO-Gen0.4550.6190.3990.6070.4850.583
  • SICO 在六个检测器与三项任务中持续降低检测器 AUC,通常低于 0.5。
  • SICO-Para 相较于 SICO-Gen 在统计检测器上的表现通常更优;两者都实现了强有力的规避。
  • 人类评估表明 SICO 文本高度可读,且任务完成率接近人类写作文本。
  • 真实世界的 Reddit 测试显示 SICO 生成的回复获得点赞和互动。
  • SICO 只需要 40 个人工撰写示例和适度的 LLM 推理,且提示可 across detectors 泛化。
  • SICO 可作为未来 AI 生成文本检测器的标准评估工具。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。