QUICK REVIEW

[Paper Review] Towards Adaptive Feedback with AI: Comparing the Feedback Quality of LLMs and Teachers on Experimentation Protocols

Kathrin Seßler, Arne Bewersdorff|ArXiv.org|Feb 18, 2025

Intelligent Tutoring Systems and Adaptive Learning4 citations

TL;DR

The study compares LLM-generated feedback with teacher and science-education expert feedback on student experimentation protocols and finds similar overall quality, with LLMs lagging in contextual error feedback.

ABSTRACT

Effective feedback is essential for fostering students' success in scientific inquiry. With advancements in artificial intelligence, large language models (LLMs) offer new possibilities for delivering instant and adaptive feedback. However, this feedback often lacks the pedagogical validation provided by real-world practitioners. To address this limitation, our study evaluates and compares the feedback quality of LLM agents with that of human teachers and science education experts on student-written experimentation protocols. Four blinded raters, all professionals in scientific inquiry and science education, evaluated the feedback texts generated by 1) the LLM agent, 2) the teachers and 3) the science education experts using a five-point Likert scale based on six criteria of effective feedback: Feed Up, Feed Back, Feed Forward, Constructive Tone, Linguistic Clarity, and Technical Terminology. Our results indicate that LLM-generated feedback shows no significant difference to that of teachers and experts in overall quality. However, the LLM agent's performance lags in the Feed Back dimension, which involves identifying and explaining errors within the student's work context. Qualitative analysis highlighted the LLM agent's limitations in contextual understanding and in the clear communication of specific errors. Our findings suggest that combining LLM-generated feedback with human expertise can enhance educational practices by leveraging the efficiency of LLMs and the nuanced understanding of educators.

Motivation & Objective

Develop an LLM feedback agent to detect errors in students’ experimentation protocols and provide adaptive feedback.
Evaluate the quality of LLM-generated feedback against feedback from practicing teachers and science education experts.
Investigate six dimensions of feedback quality (content and language related) using real student data.

Proposed method

Developed an LLM feedback agent using a zero-shot prompt to detect errors and provide adaptive feedback in a step-by-step format.
Collected 40 student protocols with 109 errors from 37 students across grades 6–8.
Collected two human feedback texts per error from 11 teachers and 5 science-education experts as benchmarks.
Evaluated feedback texts with four blinded raters on six criteria: Feed Up, Feed Back, Feed Forward, Constructive Tone, Linguistic Clarity, and Technical Terminology.
Compared group means and variances using independent t-tests, analyzed word counts, and computed Spearman correlations across feedback sources.

Experimental results

Research questions

RQ1Can an LLM-based feedback agent match the quality of teacher and expert feedback on student experimentation protocols?
RQ2In which dimensions of feedback quality do LLMs align with or diverge from human feedback?
RQ3What are the length characteristics and correlations between feedback types across sources?

Key findings

LLM-generated feedback did not differ significantly from teacher or expert feedback in overall quality.
Significant difference found in the Feed Back dimension, where humans outperformed the LLM in identifying and explaining errors in context.
LLM feedback generally scored well on language-related dimensions (Tone, Clarity, Terminology) but lagged on content-related feedback, particularly in contextual error identification.
Feedback length for LLMs clustered around ~50 words, similar to teachers, while experts produced longer feedback.
Correlations between human and LLM ratings were low for content-related aspects but higher for language-related aspects, indicating different strengths across sources.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.