QUICK REVIEW

[论文解读] Game of Tones: Faculty detection of GPT-4 generated content in university assessments

Mike Perkins, Jasper Roe|arXiv (Cornell University)|May 29, 2023

Academic integrity and plagiarism被引用 25

一句话总结

这项研究在大学评估中测试 GPT-4 内容，并评估教师通过 Turnitin AI detection 发现它的能力，揭示检测中的差距并提出评估改革建议。

ABSTRACT

This study explores the robustness of university assessments against the use of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and evaluates the ability of academic staff to detect its use when supported by the Turnitin Artificial Intelligence (AI) detection tool. The research involved twenty-two GPT-4 generated submissions being created and included in the assessment process to be marked by fifteen different faculty members. The study reveals that although the detection tool identified 91% of the experimental submissions as containing some AI-generated content, the total detected content was only 54.8%. This suggests that the use of adversarial techniques regarding prompt engineering is an effective method in evading AI detection tools and highlights that improvements to AI detection software are needed. Using the Turnitin AI detect tool, faculty reported 54.5% of the experimental submissions to the academic misconduct process, suggesting the need for increased awareness and training into these tools. Genuine submissions received a mean score of 54.4, whereas AI-generated content scored 52.3, indicating the comparable performance of GPT-4 in real-life situations. Recommendations include adjusting assessment strategies to make them more resistant to the use of AI tools, using AI-inclusive assessment where possible, and providing comprehensive training programs for faculty and students. This research contributes to understanding the relationship between AI-generated content and academic assessment, urging further investigation to preserve academic integrity.

研究动机与目标

评估大学评估对 GPT-4 生成内容的鲁棒性。
评估教师在 Turnitin AI detection 支持下的检测能力。
量化检测率及对学术诚信指标的影响。

提出的方法

生成 22 份 GPT-4 提交并嵌入评估中。
让 15 位教师对提交进行评分。
使用 Turnitin AI detect 工具识别 AI 生成的内容。
将检测到的内容与实际的 AI 内容进行比较以评估规避情况。
分析提交结果：不当行为转介和成绩。

实验结果

研究问题

RQ1Turnitin 的 AI 检测在识别真实提交中的 GPT-4 内容方面有多有效？
RQ2在学术评估中，对抗性提示工程在多大程度上能规避 AI 检测工具？
RQ3AI 生成内容对实际评估成绩和不当行为转介的影响是什么？

主要发现

AI 检测工具将 91% 的实验性 AI 生成提交中包含一定 AI 内容。
总检测到的 AI 内容为 54.8%，表明尽管工具标记，但存在相当大的规避。
教师将 54.5% 的实验提交转介至不当行为处理过程。
真正提交平均为 54.4，而 AI 生成内容平均为 52.3，显示类似的表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。