QUICK REVIEW

[論文レビュー] Stylometry Analysis of Human and Machine Text for Academic Integrity

Hezam Albaqami, Muhammad Asif Ayub|arXiv (Cornell University)|Jan 3, 2026

Academic integrity and plagiarism被引用数 0

ひとこと要約

この論文は、Gemini生成の機械文を人間が書いたデータに埋め込んで四つのタスク（機械文と人間文の識別、単著 vs 複著の分類、複数著作構成の著者変更検出、著者認識）を tackling する、スタイロメトリーに基づくNLPフレームワークを提示する。データセットとコードは公開されている。

ABSTRACT

This work addresses critical challenges to academic integrity, including plagiarism, fabrication, and verification of authorship of educational content, by proposing a Natural Language Processing (NLP)-based framework for authenticating students' content through author attribution and style change detection. Despite some initial efforts, several aspects of the topic are yet to be explored. In contrast to existing solutions, the paper provides a comprehensive analysis of the topic by targeting four relevant tasks, including (i) classification of human and machine text, (ii) differentiating in single and multi-authored documents, (iii) author change detection within multi-authored documents, and (iv) author recognition in collaboratively produced documents. The solutions proposed for the tasks are evaluated on two datasets generated with Gemini using two different prompts, including a normal and a strict set of instructions. During experiments, some reduction in the performance of the proposed solutions is observed on the dataset generated through the strict prompt, demonstrating the complexities involved in detecting machine-generated text with cleverly crafted prompts. The generated datasets, code, and other relevant materials are made publicly available on GitHub, which are expected to provide a baseline for future research in the domain.

研究の動機と目的

AI生成コンテンツによる学術的誠実性の課題（盗用・著者検証）に対処する。
機械文 vs 人間文、単著 vs 複著の分類、著者変更検出、著者認識の4タスクのためのスタイロメトリー駆動フレームワークを提案する。
Gemini生成の機械文を人間著作の文書に埋め込むことで大規模ベンチマークデータセットを作成する。
教育分野の将来のスタイロメトリ研究のベースラインとなる公開データセットとコードを提供する。

提案手法

二つのプロンプト（通常と厳格）下でGeminiを用いて機械文を生成し、それを人間著作の文書に埋め込むことでデータセットを作成する。
Null/重複チェックを含むデータ前処理とクラス重みベースのバランシング。
4つのトランスフォーマーモデル（BERT-base、ALBERT、DistilBERT、RoBERTa）を用いた微調整とドロップアウト構成を伴うテキスト分類。
4つの評価タスク：機械文 vs 人間文、単著 vs 複著文書、著者変更検知、著者認識（マルチラベル）。
ハイパーパラメータと学習設定：バッチサイズ32、勾配累積x2、学習率1e-5、コサイン減衰とウォームアップ、ウェイト減衰0.01、5エポック、fp16、訓練/検証/テスト分割70/15/15。

実験結果

リサーチクエスチョン

RQ1機械生成テキストは、プロンプトとデータセット全体で人間が書いたテキストと確実に区別できるのか？
RQ2機械テキストが人間の作品に埋め込まれている場合、単著 vs 複著文書を正確に分類できるのか？
RQ3複数著者の文書で、どの段落が著者変更を含んでいるかを検出できるのか？
RQ4複数著者の文書について、個々の著者を識別できるのか（AIを著者として含む）？
RQ5プロンプト設計と埋め込み戦略は、スタイロメトリック検出性能にどのような影響を与えるのか？

主な発見

機械文 vs 人間文の分類は、モデルとプロンプトを問わずほぼ完璧に近い精度を達成（0.999近い精度に接近）
単著 vs 複著の分類も高い精度を示し、埋め込み機械文に対して頑健性を示す
著者変更検出の性能は依然として課題で、normal vs strictプロンプトでF1スコアは約0.68–0.70程度
複数著者文書における著者認識は全体的に低いF1スコアと著者間の不均衡が顕著で、ある著者（例：Author 1）は安定して検出される一方、他の著者（例：Authors 2–4）は不安定
プロンプト設計（通常 vs 厳格）は機械文の検出可能性と埋め込みの intra-/inter-class separability に大きな影響を与える
データセットとコードの公開（GitHub）は、教育分野の誤用防止スタイロメトリー研究のベースラインとなる。

Figure 2: Flowchart of the Data generation process. The same flowchart is used for both datasets, with only differences in the instruction sets.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。