QUICK REVIEW

[논문 리뷰] Better by you, better than me, chatgpt3 as writing assistance in students essays

Željana Bašić, Ana Banovac|arXiv (Cornell University)|2023. 02. 09.

Artificial Intelligence in Healthcare and Education참고 문헌 16인용 수 36

한 줄 요약

연구는 ChatGPT-3를 작문 보조로 사용할 때 학생 에세이 품질을 비교했으며 도구를 사용하지 않는 경우와 비교; 도구로 인한 개선이 없고 오히려 실험군의 성능 저하를 시사한다.

ABSTRACT

Aim: To compare students' essay writing performance with or without employing ChatGPT-3 as a writing assistant tool. Materials and methods: Eighteen students participated in the study (nine in control and nine in the experimental group that used ChatGPT-3). We scored essay elements with grades (A-D) and corresponding numerical values (4-1). We compared essay scores to students' GPTs, writing time, authenticity, and content similarity. Results: Average grade was C for both groups; for control (2.39, SD=0.71) and for experimental (2.00, SD=0.73). None of the predictors affected essay scores: group (P=0.184), writing duration (P=0.669), module (P=0.388), and GPA (P=0.532). The text unauthenticity was slightly higher in the experimental group (11.87%, SD=13.45 to 9.96%, SD=9.81%), but the similarity among essays was generally low in the overall sample (the Jaccard similarity index ranging from 0 to 0.054). In the experimental group, AI classifier recognized more potential AI-generated texts. Conclusions: This study found no evidence that using GPT as a writing tool improves essay quality since the control group outperformed the experimental group in most parameters.

연구 동기 및 목표

ChatGPT-3를 작문 보조로 사용할 때 학생 에세이 품질이 향상되는지 평가한다.
AI 도움을 받는 그룹과 받지 않는 그룹 간의 작문 시간, 진정성, 및 내용 유사성을 비교한다.
AI 생성 텍스트 탐지 가능성과 그것이 에세이 점수와의 관계를 평가한다.

제안 방법

18명의 학생을 대조군(ChatGPT-3 없음) 또는 실험군(ChatGPT-3 있음)으로 무작위 배정한다.
에세이 채점을 A-D 등급을 수치 값으로 매핑하여(4-1) 채점한다.
그룹 간 에세이 점수, 작문 시간, 진정성, 및 내용 유사성 비교.
에세이 간 콘텐츠 유사성을 위한 Jaccard 유사도 지수 계산.
실험군에서 잠재적 AI 생성 텍스트를 평가하기 위해 AI-텍스트 분류기를 사용한다.

실험 결과

연구 질문

RQ1ChatGPT-3를 작문 보조로 사용하는 것이 전반적인 에세이 등급을 개선합니까?
RQ2AI 보조 여부에 따라 작문 시간이 에세이 품질에 어떤 영향을 미칩니까?
RQ3AI 보조 작문이 학생들의 에세이의 진정성과 내용 유사성에 영향을 줍니까?
RQ4AI 탐지 도구가 학생 에세이에서 AI 보조 작문을 신뢰성 있게 인식할 수 있습니까?

주요 결과

평균 점수는 두 그룹 모두 C였다(대조군 2.39, SD=0.71; 실험군 2.00, SD=0.73).
그룹, 작문 시간, 모듈, 및 GPA는 에세이 점수에 유의미하게 영향을 주지 않았다(P-값: 각각 0.184, 0.669, 0.388, 0.532).
텍스트 비진정성은 실험군에서 다소 높았다(11.87%, SD=13.45) 대조군의 9.96%, SD=9.81%보다.
Jaccard 유사도 지수는 전반적으로 낮은 콘텐츠 유사성을 나타냈다(0에서 0.054).
실험군에서 AI 분류기가 더 많은 잠재적 AI 생성 텍스트를 식별했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.