QUICK REVIEW

[論文レビュー] Distinguishing Human Generated Text From ChatGPT Generated Text Using Machine Learning

Niful Islam, Debopom Sutradhar|arXiv (Cornell University)|May 26, 2023

Topic Modeling被引用数 14

ひとこと要約

本論文はTF-IDF特徴量を用いたMLベースの手法を提示し、手動作成テキストとChatGPT生成テキストを識別し、11の分類器を評価し、Extremely Randomized Trees（Extremely Randomized Trees）を用いた手法がGPT-3.5データで77%の精度で最良の性能を示すと報告する。

ABSTRACT

ChatGPT is a conversational artificial intelligence that is a member of the generative pre-trained transformer of the large language model family. This text generative model was fine-tuned by both supervised learning and reinforcement learning so that it can produce text documents that seem to be written by natural intelligence. Although there are numerous advantages of this generative model, it comes with some reasonable concerns as well. This paper presents a machine learning-based solution that can identify the ChatGPT delivered text from the human written text along with the comparative analysis of a total of 11 machine learning and deep learning algorithms in the classification process. We have tested the proposed model on a Kaggle dataset consisting of 10,000 texts out of which 5,204 texts were written by humans and collected from news and social media. On the corpus generated by GPT-3.5, the proposed algorithm presents an accuracy of 77%.

研究の動機と目的

誤情報と倫理的懸念のために人間が書いたテキストとAI生成テキストを区別する必要性を動機づける。
TF-IDFベクトル化を用いた機械学習パイプラインを提案し、テキストを人間生成かChatGPT生成かに分類する。
GPT-3.5ベースのデータセット上で効果的な検出器を特定するため、伝統的な機械学習および深層学習分類器の広範なセットを評価する。

提案手法

アンダーサンプリングによるデータセットのバランス調整。
TF-IDFを用いて語項の重要性を捉える形でテキストをベクトル化。
80/20の訓練/テスト分割を用い、11の分類器に加えてMLPとLSTMを訓練・評価する。
木ベースのモデルでアンサンブル様の挙動を実現するため多数決投票を使用。
精度、適合率、再現率、F1スコア、MCCを含む指標を報告。

実験結果

リサーチクエスチョン

RQ1GPT-3.5ベースのコーパス上で、機械学習モデルが人間が書いたテキストとChatGPT生成テキストを信頼性高く区別できるか。
RQ2TF-IDF特徴量を用いたこの検出タスクにおいて、どの機械学習または深層学習アルゴリズムが最も効果的か。
RQ3前処理の選択（例：ストップワードの削除）とデータのバランス調整が分類性能にどのように影響するか。
RQ4与えられたデータセットにおける従来のMLとニューラルネットワークアプローチの比較性能はどうなるか。

主な発見

モデル	精度	適合率	再現率	F1-スコア	MCC
Logistic Regression	0.74	0.73	0.73	0.73	0.48
Support Vector Machines	0.75	0.75	0.71	0.73	0.50
Decision Tree	0.63	0.75	0.79	0.67	0.29
K-Nearest Neighbor	0.69	0.67	0.68	0.67	0.37
Random Forest	0.76	0.73	0.81	0.76	0.53
AdaBoost	0.71	0.68	0.74	0.71	0.43
Bagging Classifier	0.74	0.71	0.75	0.73	0.47
Gradient Boosting	0.71	0.66	0.78	0.72	0.42
Multi-layer Perceptron	0.72	0.73	0.72	0.72	0.43
Long Short-Term Memory	0.73	0.73	0.77	0.75	0.46
Extremely Randomized Trees	0.77	0.74	0.78	0.76	0.54

Extra Tree（Extremely Randomized Trees）分類器が最高精度0.77とMCC0.54を達成。
Random ForestとSVMも相対的に良好で、精度はおおむね0.75–0.76。
K-Nearest NeighborとDecision Treeはこのデータセットでは性能が低い。
ディープラーニングモデル（MLPとLSTM）は訓練時の精度は高いがテスト時の性能は低い。
TF-IDFが80:20分割とアンダーサンプリングによるバランシングを組み合わせた場合、人間テキストとChatGPTテキストの効果的な識別を実現する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。