[Paper Review] Self-planning Code Generation with Large Language Models
The paper introduces a two-phase, self-planning approach that decomposes intent into a plan before code generation, achieving up to 25.4% Pass@1 improvement over direct generation and up to 11.9% over Code CoT across code benchmarks. It also demonstrates improved code quality and multilingual capability.
Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-planning code generation approach with large language models, which consists of two phases, namely planning phase and implementation phase. Specifically, in the planning phase, LLM outlines concise and formatted planning steps from the intent. Subsequently, in the implementation phase, the model generates code step by step, guided by the preceding planning steps. We conduct extensive experiments on various code-generation benchmarks across multiple programming languages. Experimental results show that self-planning code generation achieves a relative improvement of up to 25.4% in Pass@1 compared to direct code generation, and up to 11.9% compared to Chain-of-Thought code generation. Moreover, our self-planning approach also enhances the quality of the generated code with respect to correctness, readability, and robustness, as assessed by humans.
Motivation & Objective
- Motivate planning to handle complex programming intents in code generation.
- Propose a two-phase framework where an LLM first plans and then implements code guided by the plan.
- Show that planning enables better problem decomposition and instruction for code synthesis.
- Evaluate the approach on multilingual and multi-language benchmarks with human judgments on code quality.
Proposed method
- Two-phase inference: planning phase uses few-shot prompts to produce a concise plan from the intent.
- Implementation phase appends the plan to the intent and generates code step-by-step guided by the plan.
- Formalization includes P(z|x,C) ∝ P(z|y,x) and P(y|x,C) in the planning and implementation separation.
- Crafted few-shot planning prompts ensure steps are actionable and high-level, with optional conditional/loop constructs.
- Comparison against baselines (Direct, Few-shot, Code CoT) and a ground-truth planning upper bound.
- Evaluation across Python, Java, Go, and JavaScript benchmarks using Pass@k, AvgPassRatio, and CodeBLEU.
Experimental results
Research questions
- RQ1RQ1: How does self-planning compare to baseline code generation approaches?
- RQ2RQ2: How does self-planning perform across different LLMs?
- RQ3RQ3: What is the optimal design of the self-planning approach (phases, shots, steps)?
- RQ4RQ4: How does self-planning perform in multilingual code generation?
- RQ5RQ5: How does problem complexity affect the benefits of self-planning?
Key findings
- Self-planning outperforms direct generation and Code CoT, with up to 25.4% Pass@1 improvement over Direct and up to 11.9% over Code CoT on evaluated benchmarks.
- Ground-truth planning yields substantial upper-bound gains (e.g., >50% on HumanEval, >30% on MBPP-ET); real planning approaches approach these gains.
- Self-planning shows emergent behavior at large model scales, and planning benefits extend across multiple base LLMs, particularly at 175B scale.
- One-phase vs two-phase designs: two-phase with careful prompting performs robustly; multi-turn planning often struggles due to truncation issues in LLMs; concise plans can be highly effective.
- Planning improves code quality in terms of readability and robustness, with human judgments indicating quality gains beyond raw correctness.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.