QUICK REVIEW

[論文レビュー] Game of Tones: Faculty detection of GPT-4 generated content in university assessments

Mike Perkins, Jasper Roe|arXiv (Cornell University)|May 29, 2023

Academic integrity and plagiarism被引用数 25

ひとこと要約

本研究は大学の評価における GPT-4 コンテンツを検証し、Turnitin AI 検出機能を用いて教員がそれを検出できる能力を評価します。検出のギャップを明らかにし、評価の改革を示唆します。

ABSTRACT

This study explores the robustness of university assessments against the use of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and evaluates the ability of academic staff to detect its use when supported by the Turnitin Artificial Intelligence (AI) detection tool. The research involved twenty-two GPT-4 generated submissions being created and included in the assessment process to be marked by fifteen different faculty members. The study reveals that although the detection tool identified 91% of the experimental submissions as containing some AI-generated content, the total detected content was only 54.8%. This suggests that the use of adversarial techniques regarding prompt engineering is an effective method in evading AI detection tools and highlights that improvements to AI detection software are needed. Using the Turnitin AI detect tool, faculty reported 54.5% of the experimental submissions to the academic misconduct process, suggesting the need for increased awareness and training into these tools. Genuine submissions received a mean score of 54.4, whereas AI-generated content scored 52.3, indicating the comparable performance of GPT-4 in real-life situations. Recommendations include adjusting assessment strategies to make them more resistant to the use of AI tools, using AI-inclusive assessment where possible, and providing comprehensive training programs for faculty and students. This research contributes to understanding the relationship between AI-generated content and academic assessment, urging further investigation to preserve academic integrity.

研究の動機と目的

GPT-4 が生成したコンテンツに対する大学の評価の頑健性を評価する。
Turnitin AI detection の支援を用いた教員の検出能力を評価する。
検出率と学術倫理指標への影響を定量化する。

提案手法

22 件の GPT-4 提出物を生成し、評価課題に埋め込む。
15 名の教員に提出物に採点を行わせる。
Turnitin AI detect ツールを用いて AI 生成コンテンツを識別する。
検出されたコンテンツを実際の AI コンテンツと比較して回避を評価する。
提出物の結果を分析する：不正行為の通報と成績。

実験結果

リサーチクエスチョン

RQ1実際の提出物において Turnitin の AI 検出は GPT-4 コンテンツを特定するのにどれくらい有効か。
RQ2対立的なプロンプトエンジニアリングは学術評価における AI 検知ツールをどの程度回避し得るか。
RQ3AI 生成コンテンツが実際の評価成績と不正行為の通報に与える影響は何か。

主な発見

AI 検知ツールは実験的な AI 生成提出物の 91% を AI コンテンツを含むと識別した。
検出された AI コンテンツの総量は 54.8% であり、ツールの警告にもかかわらず検出を大幅に回避していることを示している。
教員は実験的提出物の 54.5% を不正行為処理へ送致した。
真作提出物は平均 54.4、AI 生成コンテンツは平均 52.3 で、同等のパフォーマンスを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。