QUICK REVIEW

[논문 리뷰] Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example

Wei Xia, Shaoguang Mao|arXiv (Cornell University)|2024. 01. 07.

Edcuational Technology Systems인용 수 5

한 줄 요약

본 논문은 TOEFL 독립 작문 과제 기준을 사용하여 영어 에세이에 대한 자동 채점기로서의 ChatGPT를 조사하고, 운용 가능성을 발견하되 회귀 효과가 있으며 도메인 기반 프롬프트의 필요성을 강조한다.

ABSTRACT

Large language models have demonstrated exceptional capabilities in tasks involving natural language generation, reasoning, and comprehension. This study aims to construct prompts and comments grounded in the diverse scoring criteria delineated within the official TOEFL guide. The primary objective is to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models, within the context of automated essay scoring. The prevailing methodologies for automated essay scoring involve the utilization of deep neural networks, statistical machine learning techniques, and fine-tuning pre-trained models. However, these techniques face challenges when applied to different contexts or subjects, primarily due to their substantial data requirements and limited adaptability to small sample sizes. In contrast, this study employs ChatGPT to conduct an automated evaluation of English essays, even with a small sample size, employing an experimental approach. The empirical findings indicate that ChatGPT can provide operational functionality for automated essay scoring, although the results exhibit a regression effect. It is imperative to underscore that the effective design and implementation of ChatGPT prompts necessitate a profound domain expertise and technical proficiency, as these prompts are subject to specific threshold criteria. Keywords: ChatGPT, Automated Essay Scoring, Prompt Learning, TOEFL Independent Writing Task

연구 동기 및 목표

TOEFL 기준에 따라 영어 자동 에세이 채점을 촉진한다.
작은 표본 크기로 영어 에세이를 채점하는 ChatGPT의 능력과 한계를 평가한다.
TOEFL 채점 기준에 맞춘 프롬프트 설계가 채점 품질에 어떤 영향을 미치는지 조사한다.
채점 작업을 위한 효과적인 프롬프트 작성에서 도메인 전문 지식의 역할을 강조한다.

제안 방법

TOEFL 공식 채점 기준에 근거한 프롬프트와 코멘트를 구성한다.
작은 표본 크기로 영어 에세이를 자율적으로 평가하기 위해 ChatGPT를 사용한다.
기능성을 위한 채점 결과를 분석하고 회귀 효과를 식별한다.
프롬프트 설계가 상당한 도메인 지식과 기술적 역량을 필요로 한다고 주장한다.

실험 결과

연구 질문

RQ1ChatGPT가 TOEFL 독립 작문 과제 기준에 부합하는 기능적 자동 채점을 제공할 수 있는가?
RQ2에세이 채점에 ChatGPT를 사용할 때 관찰되는 한계나 회귀 효과는 무엇인가?
RQ3이 맥락에서 프롬프트 설계가 자동 채점의 품질과 신뢰도에 어떤 영향을 미치는가?
RQ4AEA 작업을 위해 필요한 도메인 전문 지식 수준은 어느 정도인가?

주요 결과

ChatGPT는 TOEFL 스타일 과제에 대한 자동 에세이 채점을 작동적으로 수행할 수 있다.
본 접근법에서 채점 출력에 회귀 효과가 관찰된다.
이 작업에 대한 효과적인 프롬프트 설계는 상당한 도메인 전문 지식과 기술적 역량을 필요로 한다.
꼼꼼한 프롬프트 구성으로 소표본 평가가 가능하지만 결과는 프롬프트 임계값에 따라 달라진다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.