[论文解读] Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code
该论文评估基于 Codex 的少样本工具在三个代码生成任务上的表现,并将其与手工构建的工具进行比较,发现基于模型的工具能够补充、匹配或超越传统工具,同时开发工作量显著降低。
Few-shot learning with large-scale, pre-trained language models is a powerful way to answer questions about code, e.g., how to complete a given code example, or even generate code snippets from scratch. The success of these models raises the question whether they could serve as a basis for building a wide range code generation tools. Traditionally, such tools are built manually and separately for each task. Instead, few-shot learning may allow to obtain different tools from a single pre-trained language model by simply providing a few examples or a natural language description of the expected tool behavior. This paper studies to what extent a state-of-the-art, pre-trained language model of code, Codex, may serve this purpose. We consider three code manipulation and code generation tasks targeted by a range of traditional tools: (i) code mutation; (ii) test oracle generation from natural language documentation; and (iii) test case generation. For each task, we compare few-shot learning to a manually built tool. Our results show that the model-based tools complement (code mutation), are on par (test oracle generation), or even outperform their respective traditionally built tool (test case generation), while imposing far less effort to develop them. By comparing the effectiveness of different variants of the model-based tools, we provide insights on how to design an appropriate input ("prompt") to the model and what influence the size of the model has. For example, we find that providing a small natural language description of the code generation task is an easy way to improve predictions. Overall, we conclude that few-shot language models are surprisingly effective, yet there is still more work to be done, such as exploring more diverse ways of prompting and tackling even more involved tasks.
研究动机与目标
- 评估少样本、预训练语言模型是否可以作为跨不同类别的代码生成任务的通用工具。
- 将模型驱动的工具与手动构建的工具进行比较,用于代码变异、从自然语言生成测试准则,以及测试用例生成。
- 分析提示设计和模型规模如何影响这些工具的有效性。
提出的方法
- 将先进的代码语言模型 Codex 应用于三项任务:代码变异、从自然语言文档生成测试准则,以及测试用例生成。
- 使用少样本提示和轻量级自然语言描述对工具行为进行原型化。
- 在有效性和开发投入方面,将模型驱动的工具与传统、手动构建的工具进行比较。
- 研究提示设计变体和模型规模如何影响性能。
实验结果
研究问题
- RQ1Codex 基于少样本的工具是否能够匹配或超过在代码变异、测试准则生成和测试用例生成方面的传统工具?
- RQ2提示设计(包括 NL 描述)对模型生成代码的质量在各任务上的影响是什么?
- RQ3模型规模如何影响少样本代码生成工具的有效性?
主要发现
- 基于模型的工具补充传统的代码变异工具。
- 基于模型的工具在从自然语言文档生成测试准则方面与传统工具不相上下。
- 基于模型的工具在测试用例生成方面能够胜过传统工具,同时需要的开发工作显著更少。
- 提供对任务的小型自然语言描述可提高预测。
- 不同的提示变体和模型规模影响有效性,提示优化和探索更大/更小模型的空间存在。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。