QUICK REVIEW

[논문 리뷰] AI and the FCI: Can ChatGPT Project an Understanding of Introductory Physics?

Colin G. West|ArXiv.org|2023. 03. 02.

Explainable Artificial Intelligence (XAI)인용 수 37

한 줄 요약

이 논문은 수정된 Force Concept Inventory를 사용하여 소개 물리학에서 개념적 이해를 평가하고, 3.5는 일반적인 1학기 학생 수준에 가깝고, 4는 역학 문제에서 전문가 수준에 근접한 성능을 보이는 것으로 나타냈다.

ABSTRACT

ChatGPT is a groundbreaking ``chatbot"--an AI interface built on a large language model that was trained on an enormous corpus of human text to emulate human conversation. Beyond its ability to converse in a plausible way, it has attracted attention for its ability to competently answer questions from the bar exam and from MBA coursework, and to provide useful assistance in writing computer code. These apparent abilities have prompted discussion of ChatGPT as both a threat to the integrity of higher education and conversely as a powerful teaching tool. In this work we present a preliminary analysis of how two versions of ChatGPT (ChatGPT3.5 and ChatGPT4) fare in the field of first-semester university physics, using a modified version of the Force Concept Inventory (FCI) to assess whether it can give correct responses to conceptual physics questions about kinematics and Newtonian dynamics. We demonstrate that, by some measures, ChatGPT3.5 can match or exceed the median performance of a university student who has completed one semester of college physics, though its performance is notably uneven and the results are nuanced. By these same measures, we find that ChatGPT4's performance is approaching the point of being indistinguishable from that of an expert physicist when it comes to introductory mechanics topics. After the completion of our work we became aware of Ref [1], which preceded us to publication and which completes an extensive analysis of the abilities of ChatGPT3.5 in a physics class, including a different modified version of the FCI. We view this work as confirming that portion of their results, and extending the analysis to ChatGPT4, which shows rapid and notable improvement in most, but not all respects.

연구 동기 및 목표

ChatGPT가 FCI를 통해 기초 물리학에서 개념적 이해를 보일 수 있는지 평가한다.
ChatGPT3.5와 ChatGPT4의 성능을 인간 학생 및 전문가와 비교한다.
프롬프트 엔지니어링과 질문 수정이 모델 응답에 어떤 영향을 미치는지 탐구한다.

제안 방법

30-item Force Concept Inventory (FCI)의 수정된 텍스트 버전을 사용하여 ChatGPT를 테스트한다.
도형 의존 아이템을 텍스트 설명 프롬프트로 변환하여 ChatGPT3.5와 4가 처리할 수 있도록 한다.
BASIC 및 NOVICE 프롬 prompts 스타일로 질문을 제시하여 추론과 응답의 안정성을 평가한다.
다지선다형 정확도와 질적 설명을 분석하여 겉보기에 보이는 이해와 정답 여부를 비교한다.
대규모 기초 물리학 과정의 과거 학생 포스트 테스트 분포와 모델 결과를 비교한다.

실험 결과

연구 질문

RQ1ChatGPT가 FCI로 측정된 개념적 기하학 및 뉴턴 역학 질문에 대해 정확한 응답을 생성할 수 있는가?
RQ2ChatGPT3.5와 ChatGPT4가 기초 물리학 개념에 대한 정확성과 추론 깊이에서 어떻게 비교되는가?
RQ3프롬프트 프레이밍(BASIC 대 NOVICE)과 질문 수정(도형의 텍스트 설명)이 성능에 얼마나 영향을 미치는가?

주요 결과

ChatGPT3.5는 BASIC 프롬프팅으로 23개 사용 가능한 FCI 문항 중 15개를 정확하게 답했다.
ChatGPT4는 BASIC 프롬프팅으로 23개 사용 가능한 FCI 문항 중 22개를 정확하게 답했고, 특정 가정(공기 저항 무시)하에서 문항 26을 놓쳤다.
ChatGPT4의 성능은 BASIC 프롬프팅 하에서 기초 역학 주제에 대해 전문가 물리학자 수준에 근접하다.
자유 응답 설명에서 ChatGPT3.5는 23건 중 10건에서 완전히 정확했고, 다른 경우에는 전반적으로는 정확하지만 오류가 있는 부분이 있었다.
ChatGPT3.5는 그림이 포함된 공간 추론 아이템에서 상당한 약점을 보였고, 반면 ChatGPT4는 이러한 문제의 대부분을 제거했다.
결과는 ChatGPT가 이해하는 듯한 인상을 보일 수 있으며, 3.5에서 4로의 급속한 개선을 보인다는 기존 연구와 일치한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.