QUICK REVIEW

[论文解读] Computing Education in the Era of Generative AI

Paul Denny, James Prather|arXiv (Cornell University)|Jun 5, 2023

Software Engineering Research被引用 10

一句话总结

本文综述生成式 AI 和代码生成模型（如 Codex、Copilot、GPT-4）对入门编程教育提出的挑战与机遇，并探讨这些模型如何生成学习资源与改进反馈，同时引发诚信与许可方面的担忧。

ABSTRACT

The computing education community has a rich history of pedagogical innovation designed to support students in introductory courses, and to support teachers in facilitating student learning. Very recent advances in artificial intelligence have resulted in code generation models that can produce source code from natural language problem descriptions -- with impressive accuracy in many cases. The wide availability of these models and their ease of use has raised concerns about potential impacts on many aspects of society, including the future of computing education. In this paper, we discuss the challenges and opportunities such models present to computing educators, with a focus on introductory programming classrooms. We summarize the results of two recent articles, the first evaluating the performance of code generation models on typical introductory-level programming problems, and the second exploring the quality and novelty of learning resources generated by these models. We consider likely impacts of such models upon pedagogical practice in the context of the most recent advances at the time of writing.

研究动机与目标

评估大型语言模型在典型 CS1 编程问题与考试中的表现。
调查 AI 生成代码对学术诚信、抄袭检测与许可的潜在影响，以及学生过度依赖的风险。
探索利用 AI 生成学习资源、解释与改进的编程错误信息的机会。
就 AI 进步下的入门计算教育的教学法调整与未来方向展开讨论。

提出的方法

评审两项评估 CS1 问题与资源的代码生成模型的最近研究。
实验重复：在 Python CS1 考试题上评估 Codex 的表现，与学生表现进行对比。
分析多种 Rainfall 问题变体，研究解法多样性及对测试用例的执行情况。
评估学习资源生成（编程练习与代码解释）与错误信息改进的效果。
讨论与 AI 生成代码相关的学术诚信、许可与偏见方面的考虑。

实验结果

研究问题

RQ1代码生成模型在与新手程序员水平相当的典型 CS1 编程问题上的表现如何？
RQ2AI 生成的代码对入门计算课程中的学术诚信、抄袭检测与许可有何影响？
RQ3基于大语言模型的系统如何为 CS1 生成有效的学习资源（练习、解释），它们的可靠性如何？
RQ4可以通过哪些教学法调整在利用 AI 的同时降低学习者过度依赖与不安全代码的风险？
RQ5教育中 AI 生成代码可能存在的偏见与安全问题有哪些？

主要发现

指标	结果（代表性）
是否有样例解?	84.6% (203 / 240)
样例解是否可执行?	89.7% (182 / 203)
是否包含测试用例?	70.8% (170 / 240)
所有测试通过?	30.9% (51 / 165)
完整（100%）语句覆盖率?	94.1% (48 / 51)

Codex 在考试1 获得 78.5%（15.7/20），考试2 获得 78.0%（19.5/25），在分析的 CS1 课程中在71名学生中排名第17。
Codex 的表现随提示约束而异，在限制语言特性或输出需要特定格式（如 ASCII 艺术）时有时会失败。
在50种 Rainfall 问题变体（350 次评估）中，Codex 平均约50%，且解法多样；生成的练习中有 84.6% 含样例解，且其中 89.7% 的样例解可执行。
在生成的解释中，逐行解释覆盖所有代码部分的比例为 90%，逐行正确率约为 70%；较新模型（如 ChatGPT）相比早期的 Codex 变体，产生了更高质量的解释。
AI 辅助的资源生成显示出创建新颖、主题一致的练习和测试的潜力，并能覆盖大量概念，促进更广泛的教学实验。
识别出的风险包括学术不端担忧、生成代码的许可与署名问题、初学者部署不安全代码的风险，以及 AI 输出的偏见；呼吁制定慎重的政策与监管。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。