QUICK REVIEW

[论文解读] The Imitation Game: Detecting Human and AI-Generated Texts in the Era of ChatGPT and BARD

Kadhim Hayawi, Sakib Shahriar|arXiv (Cornell University)|Jul 22, 2023

Topic Modeling被引用 8

一句话总结

论文提出一个跨体裁的人工撰写与大模型生成文本的新数据集，并评估多种机器学习模型在区分人类文本与AI文本方面的表现，在二元检测上优于多类分类。

ABSTRACT

The potential of artificial intelligence (AI)-based large language models (LLMs) holds considerable promise in revolutionizing education, research, and practice. However, distinguishing between human-written and AI-generated text has become a significant task. This paper presents a comparative study, introducing a novel dataset of human-written and LLM-generated texts in different genres: essays, stories, poetry, and Python code. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text, despite the dataset's limited sample size. However, the task becomes more challenging when classifying GPT-generated text, particularly in story writing. The results indicate that the models exhibit superior performance in binary classification tasks, such as distinguishing human-generated text from a specific LLM, compared to the more complex multiclass tasks that involve discerning among human-generated and multiple LLMs. Our findings provide insightful implications for AI text detection while our dataset paves the way for future research in this evolving area.

研究动机与目标

在教育、研究与实践中说明区分人类撰写与AI生成文本的必要性。
提出一个包含多种体裁的人类撰写与LLM生成文本的新数据集。
评估各种机器学习模型以评估它们检测AI生成内容的能力。

提出的方法

构建包含四种体裁文本的数据集：论文、故事、诗歌与Python代码。
应用若干机器学习分类器区分人类与AI生成文本。
分析分类性能，注意体裁和模型类型的差异。
比较二元（人类 vs 单个LLM）与多类（人类+多个LLM）设置。

实验结果

研究问题

RQ1ML模型是否能够在不同体裁中可靠地区分人类撰写与AI生成文本？
RQ2在区分人类文本与特定LLM之间的表现，与区分多种LLM和人类之间的表现相比有何不同？
RQ3GPT生成的文本是否更具挑战性，特别是在故事写作中？
RQ4数据集规模与体裁对AI文本检测性能有何影响？

主要发现

ML模型在跨体裁任务中有效地区分人类与AI生成文本。
在二元任务（人类 vs 某一特定LLM）中性能稳健，但在包含多种LLM的多类设置中表现下降。
GPT生成的文本，尤其是故事类，更难分类，与其他情况相比挑战性更高。
尽管样本量有限，数据集仍支持在人类与AI文本以及不同LLMs之间进行有意义的区分。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。