QUICK REVIEW

[论文解读] Improving ChatGPT Prompt for Code Generation

Chao Liu, Xuanlin Bao|arXiv (Cornell University)|May 15, 2023

Software Engineering Research被引用 38

一句话总结

本论文显示，精心设计的 prompts 结合 chain-of-thought 与多步优化，显著提升 ChatGPT 在 CodeXGlue 上的文本到代码和代码到代码生成，并分析简洁性、会话上下文和随机性等因素。

ABSTRACT

Automated code generation can be a powerful technique for software development, significantly reducing developers' efforts and time required to create new code by generating it automatically based on requirements. Recently, OpenAI's language model ChatGPT has emerged as a powerful tool for generating human-like responses to a wide range of textual inputs (i.e., prompts), including those related to code generation. However, the effectiveness of ChatGPT for code generation is not well understood, and the generation performance could be heavily influenced by the choice of prompt. To answer these questions, we conducted experiments using the CodeXGlue dataset to evaluate ChatGPT's capabilities for two code generation tasks, including text-to-code and code-to-code generation. We designed prompts by leveraging the chain-of-thought strategy with multi-step optimizations. Our results showed that by carefully designing prompts to guide ChatGPT, the generation performance can be improved substantially. We also analyzed the factors that influenced the prompt design and provided insights that could guide future research.

研究动机与目标

使用 CodeXGlue 评估 ChatGPT 在两个任务上的代码生成性能（文本到代码和代码到代码）。
研究引导 ChatGPT 的提示工程技巧，通过 chain-of-thought 和多步优化。
确定影响提示设计的因素并为未来研究提供可操作的见解。

提出的方法

使用 CodeXGlue 评估 ChatGPT 在 T2C 与 C2C 任务上的表现。
设计具有手动 chain-of-thought 基于多步优化的提示（P1–P5）。
使用 BLEU 和 CodeBLEU 指标评估提示并分析改进。
基线对比：ChatGPT-task、ChatGPT-detail、ChatGPT-behaviour。
探讨简洁性、会话与生成随机性因素的影响。
将 ChatGPT 与在 CodeXGlue 上微调的大语言模型进行比较。

实验结果

研究问题

RQ1RQ1: 设计的 prompts 对 T2C 与 C2C 任务的效果有多大？
RQ2RQ2: 要求简洁性如何影响 ChatGPT 生成的代码？
RQ3RQ3: 会话（连续 vs. 单独）设置如何影响 ChatGPT 输出？
RQ4RQ4: 生成随机性如何影响 ChatGPT 的代码质量与一致性？

主要发现

Task	Model	BLEU	CodeBLEU
T2C	ChatGPT-task	5.63	28.05
T2C	ChatGPT-detail	14.09	39.90
T2C	ChatGPT-behaviour	21.59	48.69
C2C	ChatGPT-task	10.61	46.12
C2C	ChatGPT-detail	15.79	47.71
C2C	ChatGPT-behaviour	9.47	47.38

提示设计显著提升 T2C 和 C2C 的性能（例如相对于基线的 BLEU 和 CodeBLEU 增益）。
对简洁性要求有助于提升 T2C 结果（BLEU 和 CodeBLEU），但对 C2C 的影响则参差不齐。
连续会话有助于 C2C 但对 T2C 不利，而在此设置中，单独会话对 T2C 更有利。
在所设计的提示下，生成随机性影响较小，跨运行结果稳定。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。