QUICK REVIEW

[Paper Review] The Death of the Short-Form Physics Essay in the Coming AI Revolution

Will Yeadon, O. Inyang|arXiv (Cornell University)|Dec 22, 2022

Artificial Intelligence in Healthcare and Education25 citations

TL;DR

The paper demonstrates that OpenAI's GPT-3 based models can generate five 300-word physics essays that score around 71% on a Durham University module, suggesting AI-written short-form essays threaten traditional assessment methods.

ABSTRACT

The latest AI language modules can produce original, high quality full short-form ($300$-word) Physics essays within seconds. These technologies such as ChatGPT and davinci-003 are freely available to anyone with an internet connection. In this work, we present evidence of AI generated short-form essays achieving first-class grades on an essay writing assessment from an accredited, current university Physics module. The assessment requires students answer five open-ended questions with a short, $300$-word essay each. Fifty AI answers were generated to create ten submissions that were independently marked by five separate markers. The AI generated submissions achieved an average mark of $71 \pm 2 \%$, in strong agreement with the current module average of $71 \pm 5 %$. A typical AI submission would therefore most-likely be awarded a First Class, the highest classification available at UK universities. Plagiarism detection software returned a plagiarism score between $2 \pm 1$% (Grammarly) and $7 \pm 2$% (TurnitIn). We argue that these results indicate that current AI MLPs represent a significant threat to the fidelity of short-form essays as an assessment method in Physics courses.

Motivation & Objective

Motivate concern about AI text generation threatening fidelity of short-form physics essays as assessments.
Assess whether AI-generated short-form essays can reach first-class performance on a real university module.
Characterize the consistency and detectability of AI-generated essays compared with human submissions.
Discuss implications for assessment design and potential mitigations in higher education.

Proposed method

Use five open-ended physics questions (five 300-word essays) from Durham University’s Physics in Society module as the assessment basis.
Generate ten AI-written submissions (five questions per submission) using the OpenAI davinci-003 playground with prompts based on the questions.
Have five independent markers score the AI submissions, compare to module averages, and analyze plagiarism scores from Grammarly and Turnitin.
Present examples of AI outputs and discuss prompt engineering to obtain discursive, original responses.
Evaluate inter-marker agreement and the potential future role of AI as tutor or feedback provider.

Experimental results

Research questions

RQ1Can AI language models produce short-form physics essays that achieve high marks on an accredited university assessment?
RQ2How do AI-generated essays compare with human student performance in terms of average scores and marking consistency?
RQ3Are AI-written essays detectable by standard plagiarism tools, and what are their qualities in terms of originality and style?
RQ4What implications do AI capabilities have for assessment design and academic integrity in higher education?

Key findings

Ten AI-generated submissions (five questions each) averaged 71±2% across five markers.
This AI average aligns with the Physics in Society module average (71±5%) and with Durham second-year physics module averages (72±3%).
AI essays were consistently scored across markers, with marker averages 73.0±1.6, 72.6±2.0, 69±2, 70±2, and 70.6±1.9, indicating strong inter-marker agreement.
AI plagiarism scores averaged 2±1% (Grammarly) and 7±2% (Turnitin), suggesting AI-written text can appear sufficiently original for typical university checks beyond the supplied questions.
The results imply that current AI models can generate high-quality short-form physics essays at a First Class level, challenging the validity of short-form essays as an assessment method.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.