Skip to main content
QUICK REVIEW

[Paper Review] The Death of the Short-Form Physics Essay in the Coming AI Revolution

Will Yeadon, O. Inyang|arXiv (Cornell University)|Dec 22, 2022
Artificial Intelligence in Healthcare and Education25 citations
TL;DR

The paper demonstrates that OpenAI's GPT-3 based models can generate five 300-word physics essays that score around 71% on a Durham University module, suggesting AI-written short-form essays threaten traditional assessment methods.

ABSTRACT

The latest AI language modules can produce original, high quality full short-form ($300$-word) Physics essays within seconds. These technologies such as ChatGPT and davinci-003 are freely available to anyone with an internet connection. In this work, we present evidence of AI generated short-form essays achieving first-class grades on an essay writing assessment from an accredited, current university Physics module. The assessment requires students answer five open-ended questions with a short, $300$-word essay each. Fifty AI answers were generated to create ten submissions that were independently marked by five separate markers. The AI generated submissions achieved an average mark of $71 \pm 2 \%$, in strong agreement with the current module average of $71 \pm 5 %$. A typical AI submission would therefore most-likely be awarded a First Class, the highest classification available at UK universities. Plagiarism detection software returned a plagiarism score between $2 \pm 1$% (Grammarly) and $7 \pm 2$% (TurnitIn). We argue that these results indicate that current AI MLPs represent a significant threat to the fidelity of short-form essays as an assessment method in Physics courses.

Motivation & Objective

  • Motivate concern about AI text generation threatening fidelity of short-form physics essays as assessments.
  • Assess whether AI-generated short-form essays can reach first-class performance on a real university module.
  • Characterize the consistency and detectability of AI-generated essays compared with human submissions.
  • Discuss implications for assessment design and potential mitigations in higher education.

Proposed method

  • Use five open-ended physics questions (five 300-word essays) from Durham University’s Physics in Society module as the assessment basis.
  • Generate ten AI-written submissions (five questions per submission) using the OpenAI davinci-003 playground with prompts based on the questions.
  • Have five independent markers score the AI submissions, compare to module averages, and analyze plagiarism scores from Grammarly and Turnitin.
  • Present examples of AI outputs and discuss prompt engineering to obtain discursive, original responses.
  • Evaluate inter-marker agreement and the potential future role of AI as tutor or feedback provider.

Experimental results

Research questions

  • RQ1Can AI language models produce short-form physics essays that achieve high marks on an accredited university assessment?
  • RQ2How do AI-generated essays compare with human student performance in terms of average scores and marking consistency?
  • RQ3Are AI-written essays detectable by standard plagiarism tools, and what are their qualities in terms of originality and style?
  • RQ4What implications do AI capabilities have for assessment design and academic integrity in higher education?

Key findings

  • Ten AI-generated submissions (five questions each) averaged 71±2% across five markers.
  • This AI average aligns with the Physics in Society module average (71±5%) and with Durham second-year physics module averages (72±3%).
  • AI essays were consistently scored across markers, with marker averages 73.0±1.6, 72.6±2.0, 69±2, 70±2, and 70.6±1.9, indicating strong inter-marker agreement.
  • AI plagiarism scores averaged 2±1% (Grammarly) and 7±2% (Turnitin), suggesting AI-written text can appear sufficiently original for typical university checks beyond the supplied questions.
  • The results imply that current AI models can generate high-quality short-form physics essays at a First Class level, challenging the validity of short-form essays as an assessment method.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.