QUICK REVIEW

[论文解读] Better by you, better than me, chatgpt3 as writing assistance in students essays

Željana Bašić, Ana Banovac|arXiv (Cornell University)|Feb 9, 2023

Artificial Intelligence in Healthcare and Education参考文献 16被引用 36

一句话总结

本研究比较学生在使用 ChatGPT-3 作为写作助手与不使用时的作文质量；结果发现工具并未带来改进，实验组甚至表现提示较低。

ABSTRACT

Aim: To compare students' essay writing performance with or without employing ChatGPT-3 as a writing assistant tool. Materials and methods: Eighteen students participated in the study (nine in control and nine in the experimental group that used ChatGPT-3). We scored essay elements with grades (A-D) and corresponding numerical values (4-1). We compared essay scores to students' GPTs, writing time, authenticity, and content similarity. Results: Average grade was C for both groups; for control (2.39, SD=0.71) and for experimental (2.00, SD=0.73). None of the predictors affected essay scores: group (P=0.184), writing duration (P=0.669), module (P=0.388), and GPA (P=0.532). The text unauthenticity was slightly higher in the experimental group (11.87%, SD=13.45 to 9.96%, SD=9.81%), but the similarity among essays was generally low in the overall sample (the Jaccard similarity index ranging from 0 to 0.054). In the experimental group, AI classifier recognized more potential AI-generated texts. Conclusions: This study found no evidence that using GPT as a writing tool improves essay quality since the control group outperformed the experimental group in most parameters.

研究动机与目标

评估 ChatGPT-3 作为写作助手是否能提高学生的作文质量。
比较有无 AI 助力时的写作时间、真实性和内容相似性。
评估 AI 生成文本的可检测性及其与作文分数的关系。

提出的方法

将 18 名学生随机分配到对照组（无 ChatGPT-3）或实验组（有 ChatGPT-3）。
使用将 A-D 等级映射到数值（4-1）进行作文评分。
比较两组的作文分数、写作时长、真实性和内容相似性。
计算跨作文的内容相似性（Jaccard 相似系数）。
使用 AI 文本分类器评估实验组中潜在的 AI 生成文本。

实验结果

研究问题

RQ1使用 ChatGPT-3 作为写作助手是否能提高整体作文等级？
RQ2在有无 AI 助力的情况下，写作时长如何影响作文质量？
RQ3AI 辅助写作是否影响学生作文的真实性和内容相似性？
RQ4AI 检测工具是否能可靠识别学生作文中的 AI 辅助写作？

主要发现

两组的平均等级均为 C（对照组 2.39，SD=0.71；实验组 2.00，SD=0.73）。
组别、写作时长、模块和 GPA 对作文分数无显著影响（P 值分别为：0.184、0.669、0.388、0.532）。
文本不真实性在实验组略高（11.87%，SD=13.45）而对照组为 9.96%，SD=9.81%。
Jaccard 相似性指数整体样本显示内容相似性普遍偏低（0 到 0.054）。
在实验组中，AI 分类器识别出更多潜在的 AI 生成文本。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。