QUICK REVIEW

[논문 리뷰] Comparing Code Explanations Created by Students and Large Language Models

Juho Leinonen, Paul Denny|arXiv (Cornell University)|2023. 04. 08.

Software Engineering Research참고 문헌 40인용 수 10

한 줄 요약

본 연구는 대형 CS1 과정에서 학생이 작성한 코드 설명과 GPT-3으로 생성된 설명을 비교하고, 대형 언어 모델의 설명이 더 정확하고 이해하기 쉬운 것으로 평가되며 길이는 비슷하다는 것을 발견했다.

ABSTRACT

Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, developing the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course ($n \approx 1000$) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education.

연구 동기 및 목표

대규모 CS 강의에서 코드 설명 스캐폴딩의 확장 필요성을 제기한다.
LLM이 생성한 코드 설명이 정확도와 이해도 면에서 학생이 만든 설명과 일치하거나 우월한지 조사한다.
학생들이 코드 설명의 어떤 측면을 가장 유용하다고 평가하는지와 설명이 어떻게 평가되는지 살펴본다.
LLM 설명이 코드를 설명하는 초보자에게 확장 가능한 모범 예시로 작용할 수 있는지 평가한다.

제안 방법

약 1000명의 대형 1학년 강의를 사용하여 세 함수에 대한 설명을 수집한다.
학생들이 세 함수에 대한 설명을 작성하도록 하고(Lab A), 그다음 학생과 GPT-3 모두의 설명으로부터 무작위 샘플 54개를 평가한다(Lab B).
설명을 세 가지 5점 리커트 척도 질문으로 평가한다: 이해의 용이성, 요약의 정확성, 이상적인 길이.
문자 수로 길이를 비교하여 기본 차이를 확인한다.
Bonferroni 보정과 함께 Mann–Whitney U 검정을 적용하여 소스 간 차이를 평가한다.
학생들의 자유 응답에 대한 주제 분석을 수행하여 설명에서 가치 있게 여겨지는 특성을 식별한다.

실험 결과

연구 질문

RQ1RQ1 학생이 만든 코드 설명과 LLM이 만든 코드 설명이 정확도, 길이, 이해도에서 어느 정도 차이가 나는가?
RQ2RQ2 학생들이 코드 설명의 어떤 측면을 가치 있게 여기는가?

주요 결과

LLM이 생성한 설명이 학생이 만든 설명보다 더 정확한 것으로 평가된다.
LLM이 생성한 설명은 학생이 만든 설명보다 이해하기 쉽다고 평가된다.
학생이 만든 설명과 LLM이 생성한 설명 사이의 지각된 길이 또는 실제 길이에 통계적으로 유의한 차이가 없다.
보정 후 이상적 길이 평점은 원천 간에 유의한 차이가 없다.
개방형 응답에서 학생들은 행별(line-by-line) 설명을 선호하고 입력/출력을 명시하고 코드의 목적을 설명하는 설명을 가치 있게 여긴다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.