QUICK REVIEW

[论文解读] Self-planning Code Generation with Large Language Models

Xue Jiang, Yihong Dong|arXiv (Cornell University)|Mar 12, 2023

Software Engineering Research被引用 11

一句话总结

论文提出一个两阶段的自规划方法，在代码生成前先将意图分解为一个计划，相较直接生成在 Pass@1 上提升高达 25.4%，相较 Code CoT 在代码基准测试上提升高达 11.9%，并且在多语言评测中展示了更高的代码质量和多语言能力。

ABSTRACT

Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-planning code generation approach with large language models, which consists of two phases, namely planning phase and implementation phase. Specifically, in the planning phase, LLM outlines concise and formatted planning steps from the intent. Subsequently, in the implementation phase, the model generates code step by step, guided by the preceding planning steps. We conduct extensive experiments on various code-generation benchmarks across multiple programming languages. Experimental results show that self-planning code generation achieves a relative improvement of up to 25.4% in Pass@1 compared to direct code generation, and up to 11.9% compared to Chain-of-Thought code generation. Moreover, our self-planning approach also enhances the quality of the generated code with respect to correctness, readability, and robustness, as assessed by humans.

研究动机与目标

推动在代码生成中处理复杂编程意图时的规划应用。
提出一个两阶段框架：先由 LLM 进行规划，再在规划指导下实现代码。
证明规划能更好地进行问题分解与对代码合成的指令设计。
在多语言与多语言基准上进行评测，并通过人工判断评估代码质量。

提出的方法

两阶段推断：规划阶段使用少量示例提示从意图中产生简洁的计划。
实现阶段将计划附加到意图上，按计划逐步生成代码并进行指导。
形式化包括规划阶段与实现阶段的分离中的 P(z|x,C) ∝ P(z|y,x) 与 P(y|x,C)。
精心设计的少量示例规划提示，确保步骤可操作且具有高层次性，并可选包含条件/循环结构。
与基线（Direct、Few-shot、Code CoT）及一个真实规划的上限进行对比。
在 Python、Java、Go 和 JavaScript 的基准上使用 Pass@k、AvgPassRatio 和 CodeBLEU 进行评估。

实验结果

研究问题

RQ1RQ1：自规划相对于基线代码生成方法的表现如何？
RQ2RQ2：自规划在不同大语言模型上的表现如何？
RQ3RQ3：自规划方法的最优设计（阶段、示例、步骤）是什么？
RQ4RQ4：自规划在多语言代码生成中的表现如何？
RQ5RQ5：问题复杂度如何影响自规划的收益？

主要发现

自规划在评估基准上优于直接生成和 Code CoT，相较直接生成在 Pass@1 上提升高达 25.4%，相较 Code CoT 提升高达 11.9%。
真实规划（ground-truth planning）在上限方面带来显著提升（例如在 HumanEval >50%，在 MBPP-ET >30%）；实际的规划方法接近这些提升。
在大模型尺度下自规划表现出涌现性行为，规划收益在多种基础 LLM 上普遍存在，尤其在 175B 规模时更明显。
单阶段 vs 双阶段设计：在规划提示设计得当的情况下，双阶段表现稳健；多轮规划常因 LLM 的截断问题而困难；简洁的规划也可能极具效果。
规划提升了代码质量（可读性和鲁棒性），通过人工评判显示质量提升超过单纯正确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。