Skip to main content
QUICK REVIEW

[论文解读] How Novices Use LLM-Based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment

Majeed Kazemitabaar, Xinying Hou|arXiv (Cornell University)|Sep 25, 2023
Software Engineering Research被引用 9
一句话总结

本研究分析了33名初学者在自学环境中完成45个CS1任务时如何使用基于OpenAI Codex的AI代码生成器,识别使用模式、提示风格、AI生成代码特性以及四种编码方法。

ABSTRACT

As Large Language Models (LLMs) gain in popularity, it is important to understand how novice programmers use them. We present a thematic analysis of 33 learners, aged 10-17, independently learning Python through 45 code-authoring tasks using Codex, an LLM-based code generator. We explore several questions related to how learners used these code generators and provide an analysis of the properties of the written prompts and the generated code. Specifically, we explore (A) the context in which learners use Codex, (B) what learners are asking from Codex, (C) properties of their prompts in terms of relation to task description, language, and clarity, and prompt crafting patterns, (D) the correctness, complexity, and accuracy of the AI-generated code, and (E) how learners utilize AI-generated code in terms of placement, verification, and manual modifications. Furthermore, our analysis reveals four distinct coding approaches when writing code with an AI code generator: AI Single Prompt, where learners prompted Codex once to generate the entire solution to a task; AI Step-by-Step, where learners divided the problem into parts and used Codex to generate each part; Hybrid, where learners wrote some of the code themselves and used Codex to generate others; and Manual coding, where learners wrote the code themselves. The AI Single Prompt approach resulted in the highest correctness scores on code-authoring tasks, but the lowest correctness scores on subsequent code-modification tasks during training. Our results provide initial insight into how novice learners use AI code generators and the challenges and opportunities associated with integrating them into self-paced learning environments. We conclude with various signs of over-reliance and self-regulation, as well as opportunities for curriculum and tool development.

研究动机与目标

  • Understand when and why novice learners use an AI code generator during CS1 tasks in a self-paced environment.
  • Characterize the prompts novices craft to interact with Codex and how these prompts relate to task descriptions.
  • Analyze properties of AI-generated code (correctness, complexity, alignment with curriculum) and how learners integrate it.
  • Identify common coding approaches used with AI generation and their impact on learning outcomes.

提出的方法

  • The authors perform a thematic analysis on log data from 33 novice learners (ages 10-17) using Codex during 45 Python coding tasks in Coding Steps.
  • Data sources include time-stamped logs: code edits, console runs, AI generation prompts and outputs, and task submissions.
  • A custom log-analysis interface supports visualization and replication of student behavior in a vertical time sequence.
  • Researchers applied deductive and inductive thematic analysis to code Codex usages into contexts, prompt attributes, AI-generated code properties, and usage patterns.
  • Inter-rater reliability on codebook application achieved 0.87 (alpha) across initial coding rounds.
Figure 1. An example of using AI-generated code as an example to fix syntax error with writing loops.
Figure 1. An example of using AI-generated code as an example to fix syntax error with writing loops.

实验结果

研究问题

  • RQ1RQ1: How do novices use and interact with LLM-based Code Generators when learning CS1 coding tasks in a self-paced environment? (in terms of when Codex is used, what is asked of Codex, prompt properties, AI-generated code properties, and how code is used/verified)
  • RQ2RQ2: What coding approaches do novices employ when using AI code generators, and how do these approaches affect learning outcomes?

主要发现

  • Four coding approaches emerged: AI Single Prompt, AI Step-by-Step, Hybrid, and Manual coding.
  • AI Single Prompt yielded the highest correctness on code-authoring tasks but the lowest on subsequent code-modification tasks.
  • 81% of AI-generated code had no identifiable problems; 19% had issues such as not following task requirements or regenerating existing code.
  • Learners often prompted Codex to generate entire solutions, subgoals, or to fix existing code, with prompts frequently copying or rewording task descriptions.
  • Prompt patterns included sentence-by-sentence task decomposition and repeated rephrasings to guide generation.
  • Evidence of over-reliance and self-regulation suggests a need for curriculum and tool design to promote effective AI-assisted learning.
Figure 2. An example of keeping the original code instead of replacing it with AI-generated code ( $P_{12}$ ).
Figure 2. An example of keeping the original code instead of replacing it with AI-generated code ( $P_{12}$ ).

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。