QUICK REVIEW

[論文レビュー] Comparing Code Explanations Created by Students and Large Language Models

Juho Leinonen, Paul Denny|arXiv (Cornell University)|Apr 8, 2023

Software Engineering Research参考文献 40被引用数 10

ひとこと要約

この研究は大規模なCS1コースで学生が作成したコード説明とGPT-3生成の説明を比較し、LLMの説明はより正確で理解しやすいと評価され、長さはほぼ同程度である。

ABSTRACT

Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, developing the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course ($n \approx 1000$) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education.

研究の動機と目的

大規模なCS教室でのコード説明の支援をスケールさせる必要性を動機づける。
LLM生成のコード説明が学生作成の説明と同等かそれを上回る精度と理解しやすさを発揮できるかを調査する。
学生がコード説明のどの側面を最も有用と感じるか、説明がどのように評価されるかを検討する。
LLMの説明が初心者がコードを説明する方法を学ぶ際のスケーラブルな模範となり得るかを評価する。

提案手法

大規模な1年次コース（約1000名）を使用して3つの関数の説明を収集する。
学生が3つの関数の説明を作成する（Lab A）→ その後、学生とGPT-3の双方の説明からランダムサンプル54件を評価する（ Lab B）。
3つの5点リッカート質問で説明を評価する：理解のしやすさ、要約の正確さ、理想的な長さ。
文字数で長さを比較して基準差を設定する。
出所間の差を評価するためにBonferroni補正付きの Mann–Whitney U 検定を適用する。
学生の自由回答をテーマ的分析を実施して、説明の価値ある特性を特定する。

実験結果

リサーチクエスチョン

RQ1RQ1 学生とLLMが作成したコード説明は、正確さ、長さ、理解しやすさの点でどの程度異なるか？
RQ2RQ2 学生はコード説明のどの側面を重視しているか？

主な発見

LLM生成の説明は、学生作成のものより正確と評価される。
LLM生成の説明は、学生作成のものより理解しやすいと評価される。
学生作成とLLM生成の説明の長さにおける主観的・実測の差は統計的に有意ではない。
補正後、理想的な長さの評価は出所間で有意差がない。
自由回答では、学生は行ごとに説明する形式を好み、入力/出力を明記し、コードの目的を説明する説明を評価している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。