QUICK REVIEW

[論文レビュー] Unreflected Acceptance -- Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education

Lars Krupp, Steffen Steinert|arXiv (Cornell University)|Aug 21, 2023

Artificial Intelligence in Healthcare and Education被引用数 19

ひとこと要約

本研究は、物理学の学生が ChatGPT を使用するとその回答を過信し、クエリをコピー＆ペーストし、検索エンジンを使用する学生より物理問題の成績が低くなることを示しており、教育現場における適度で内省的な LLM の利用の必要性を浮き彫りにしている。

ABSTRACT

Large language models (LLMs) have recently gained popularity. However, the impact of their general availability through ChatGPT on sensitive areas of everyday life, such as education, remains unclear. Nevertheless, the societal impact on established educational methods is already being experienced by both students and educators. Our work focuses on higher physics education and examines problem solving strategies. In a study, students with a background in physics were assigned to solve physics exercises, with one group having access to an internet search engine (N=12) and the other group being allowed to use ChatGPT (N=27). We evaluated their performance, strategies, and interaction with the provided tools. Our results showed that nearly half of the solutions provided with the support of ChatGPT were mistakenly assumed to be correct by the students, indicating that they overly trusted ChatGPT even in their field of expertise. Likewise, in 42% of cases, students used copy & paste to query ChatGPT -- an approach only used in 4% of search engine queries -- highlighting the stark differences in interaction behavior between the groups and indicating limited reflection when using ChatGPT. In our work, we demonstrated a need to (1) guide students on how to interact with LLMs and (2) create awareness of potential shortcomings for users.

研究の動機と目的

STEM 学生の間で ChatGPT へのアクセスが物理問題解決の成績にどのように影響するかを評価する。
ChatGPT と従来の検索エンジン使用の間での相互作用戦略と内省を比較する。
物理問題に LLM を使用する際の過信と批判的評価の欠如のリスクを特定する。
LLM ベースの教育支援ツールの適度な設計と認識を促す方向性を提案する。

提案手法

ChatGPT アクセス（N=27）とインターネット検索エンジンアクセス（N=12）の2条件間の被験者間デザイン。
本課題前の物理知識を評価するプレテスト；学校の知識で解ける4問の物理問題を用いた本課題テスト。
ChatGPT と検索結果の性能、相互作用プロトコル、認識される正確性の分析。
プロンプトと応答における相互作用タイプのコーディング（コピー＆ペースト、前処理、後処理、変換）。
戦略、内省、使いやすさの認識を捕捉する退出インタビューと質問紙。

実験結果

リサーチクエスチョン

RQ1RQ1: ChatGPT へのアクセスは、検索エンジンと比較して物理問題の成績にどのような影響を与えるか？
RQ2RQ2: ChatGPT を使用する場合と検索エンジンを使用する場合で、どのような解法戦略と相互作用パターンが現れるか？
RQ3RQ3: 学生は ChatGPT が生成した回答の正確性を、専門家の判断と比べてどのように認識するか？
RQ4RQ4: これらの相互作用パターンは、適度に管理された LLM 搭載教育ツールの設計にどのような示唆を与えるか。

主な発見

ChatGPT ユーザーは 12 点満点中平均 1.04 点、検索エンジンユーザーは平均 1.83 点で、ChatGPT の成績は有意に劣っていた（F(1,37)=5.5, p=.02, η2=.13）。
ChatGPT の回答のおよそ 57% が学生によって誤って正解と評価された（偽陽性率）、一方で正解の 91% が正と評価された（真陽性率）。
コピー＆ペーストは支配的な相互作用で、ChatGPT では 84 のプロンプトで使用され、内省の限定に寄与した。対照的に、検索プロンプトの 96% はキーワードを用いた体系的プロンプトだった。
専門家の意見と異なるにもかかわらず、学生が ChatGPT 提供の解答のほぼ半分を正しいと信じていたことが、過信とLLM 出力の内省不足を示している。
インタビューでは戦略のばらつきが明らかになり、批判的思考を支援する適切に情報を得た適度な利用が必要で、無批判的な依存を避けるべきことが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。