QUICK REVIEW

[论文解读] AI and the FCI: Can ChatGPT Project an Understanding of Introductory Physics?

Colin G. West|ArXiv.org|Mar 2, 2023

Explainable Artificial Intelligence (XAI)被引用 37

一句话总结

本文利用对改良后的 Force Concept Inventory 对两种 ChatGPT 版本（3.5 和 4）进行评估，以检测入门物理的概念理解，结果显示 3.5 约等于典型的第一学期学生，4 在力学题目上接近专家水平表现。

ABSTRACT

ChatGPT is a groundbreaking ``chatbot"--an AI interface built on a large language model that was trained on an enormous corpus of human text to emulate human conversation. Beyond its ability to converse in a plausible way, it has attracted attention for its ability to competently answer questions from the bar exam and from MBA coursework, and to provide useful assistance in writing computer code. These apparent abilities have prompted discussion of ChatGPT as both a threat to the integrity of higher education and conversely as a powerful teaching tool. In this work we present a preliminary analysis of how two versions of ChatGPT (ChatGPT3.5 and ChatGPT4) fare in the field of first-semester university physics, using a modified version of the Force Concept Inventory (FCI) to assess whether it can give correct responses to conceptual physics questions about kinematics and Newtonian dynamics. We demonstrate that, by some measures, ChatGPT3.5 can match or exceed the median performance of a university student who has completed one semester of college physics, though its performance is notably uneven and the results are nuanced. By these same measures, we find that ChatGPT4's performance is approaching the point of being indistinguishable from that of an expert physicist when it comes to introductory mechanics topics. After the completion of our work we became aware of Ref [1], which preceded us to publication and which completes an extensive analysis of the abilities of ChatGPT3.5 in a physics class, including a different modified version of the FCI. We view this work as confirming that portion of their results, and extending the analysis to ChatGPT4, which shows rapid and notable improvement in most, but not all respects.

研究动机与目标

评估 ChatGPT 是否能通过 FCI 展示入门物理的概念理解。
将 ChatGPT3.5 与 ChatGPT4 的表现与人类学生及专家进行比较。
探究提示设计与问题修改如何影响模型回答。

提出的方法

使用修改后的、仅文本的 30 题 Force Concept Inventory (FCI) 来测试 ChatGPT。
将依赖图示的题目转换为文本描述的提示，以便 ChatGPT3.5 和 4 处理。
以 BASIC 和 NOVICE 提示风格提问，以评估推理及回答的稳定性。
分析多项选择的正确率与定性解释，以衡量表观理解与正确答案的差异。
将模型结果与一个大型入门物理课程的历史性学生期末测试后测分布进行比较。

实验结果

研究问题

RQ1Can ChatGPT produce correct responses to conceptual kinematics and Newtonian dynamics questions as measured by the FCI?
RQ2How do ChatGPT3.5 and ChatGPT4 compare in accuracy and depth of reasoning on introductory physics concepts?
RQ3To what extent does prompt framing (BASIC vs NOVICE) and question modification (textual descriptions of figures) affect performance?

主要发现

ChatGPT3.5 answered 15 of 23 usable FCI items correctly with BASIC prompting.
ChatGPT4 answered 22 of 23 usable FCI items correctly with BASIC prompting, missing item 26 under certain assumptions (air resistance neglected).
ChatGPT4’s performance is near the level of an expert physicist for introductory mechanics topics under BASIC prompting.
Free-response explanations from ChatGPT3.5 were entirely correct in 10 of 23 cases and broadly correct but with errors in others.
ChatGPT3.5 showed substantial weaknesses on spatial-reasoning items involving figures, whereas ChatGPT4 eliminated most of these issues.
Results align with prior work showing ChatGPT can display appearance of understanding, and show rapid improvement from 3.5 to 4.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。