QUICK REVIEW

[论文解读] Distinguishing Human Generated Text From ChatGPT Generated Text Using Machine Learning

Niful Islam, Debopom Sutradhar|arXiv (Cornell University)|May 26, 2023

Topic Modeling被引用 14

一句话总结

本文提出基于ML的TF-IDF特征方法，用于区分人工撰写文本与ChatGPT生成文本，评估了11种分类器，发现Extremely Randomized Trees在GPT-3.5数据上实现77%的准确率。

ABSTRACT

ChatGPT is a conversational artificial intelligence that is a member of the generative pre-trained transformer of the large language model family. This text generative model was fine-tuned by both supervised learning and reinforcement learning so that it can produce text documents that seem to be written by natural intelligence. Although there are numerous advantages of this generative model, it comes with some reasonable concerns as well. This paper presents a machine learning-based solution that can identify the ChatGPT delivered text from the human written text along with the comparative analysis of a total of 11 machine learning and deep learning algorithms in the classification process. We have tested the proposed model on a Kaggle dataset consisting of 10,000 texts out of which 5,204 texts were written by humans and collected from news and social media. On the corpus generated by GPT-3.5, the proposed algorithm presents an accuracy of 77%.

研究动机与目标

鉴于错误信息与伦理问题，需要区分人工文本与AI生成文本的需求被强调。
提出一个使用TF-IDF向量化的机器学习流程，将文本分类为人工生成或ChatGPT生成。
评估一组传统与深度学习分类器，识别在GPT-3.5为基础的数据集上有效的检测器。

提出的方法

通过下采样平衡数据集。
使用TF-IDF向量化文本以捕捉词项重要性。
在80/20的训练/测试划分下，训练并评估11种分类器，以及MLP和LSTM。
在基于树的模型中使用多数投票实现类似集成的行为。
报告包括准确率、精确度、召回率、F1-score和MCC等指标。

实验结果

研究问题

RQ1模型能否在基于GPT-3.5的语料库上可靠地区分人工撰写文本与ChatGPT生成文本？
RQ2对于这个检测任务，使用TF-IDF特征，哪种机器学习或深度学习算法最有效？
RQ3预处理选项（如停止词删除）和数据平衡如何影响分类性能？
RQ4在给定数据集上，传统ML与神经网络方法的对比性能如何？

主要发现

模型	准确率	精确度	召回率	F1-分数	MCC
Logistic Regression	0.74	0.73	0.73	0.73	0.48
Support Vector Machines	0.75	0.75	0.71	0.73	0.50
Decision Tree	0.63	0.75	0.79	0.67	0.29
K-Nearest Neighbor	0.69	0.67	0.68	0.67	0.37
Random Forest	0.76	0.73	0.81	0.76	0.53
AdaBoost	0.71	0.68	0.74	0.71	0.43
Bagging Classifier	0.74	0.71	0.75	0.73	0.47
Gradient Boosting	0.71	0.66	0.78	0.72	0.42
Multi-layer Perceptron	0.72	0.73	0.72	0.72	0.43
Long Short-Term Memory	0.73	0.73	0.77	0.75	0.46
Extremely Randomized Trees	0.77	0.74	0.78	0.76	0.54

Extremely Randomized Trees分类器实现了最高的准确率0.77和MCC 0.54。
Random Forest与SVM也表现较好，准确率在0.75–0.76之间。
K-最近邻和决策树在本数据集上表现较差。
深度学习模型（MLP和LSTM）在训练集上达到高准确率，但在测试集上的表现较低。
使用80:20分割的TF-IDF结合下采样实现了对人工文本与ChatGPT文本的有效区分。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。