Skip to main content
QUICK REVIEW

[논문 리뷰] A Syntactic Neural Model for General-Purpose Code Generation

Pengcheng Yin, Graham Neubig|arXiv (Cornell University)|2017. 04. 06.
Natural Language Processing Techniques참고 문헌 44인용 수 114
한 줄 요약

The paper presents a grammar-informed neural model that generates an AST to produce well-formed code, achieving state-of-the-art results for general-purpose Python code from natural language descriptions.

ABSTRACT

We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python. Existing data-driven methods treat this problem as a language generation task without considering the underlying syntax of the target programming language. Informed by previous work in semantic parsing, in this paper we propose a novel neural architecture powered by a grammar model to explicitly capture the target syntax as prior knowledge. Experiments find this an effective way to scale up to generation of complex programs from natural language descriptions, achieving state-of-the-art results that well outperform previous code generation and semantic parsing approaches.

연구 동기 및 목표

  • Motivate parsing natural language descriptions into general-purpose code rather than unstructured text sequences.
  • Leverage target-language syntax by modeling AST derivations with a probabilistic grammar.
  • Improve code generation quality by incorporating structural information and parent/ sibling context in neural decoding.

제안 방법

  • Define p(y|x) as the probability of generating an AST y given NL input x under a fixed Python grammar.
  • Use a grammar model that alternates ApplyRule (production rule application) and GenToken (terminal token emission) actions to construct the AST in a depth-first, left-to-right manner.
  • Extend the decoder with structural neural connections (parent feeding and frontier node embeddings) to reflect AST topology.
  • Compute action probabilities with an attentional encoder-decoder that uses (i) a bidirectional LSTM encoder for NL input, (ii) an LSTM-based decoder with grammar-informed state, (iii) copy mechanism via a pointer network for terminal tokens drawn from the input.
  • Train by maximizing the likelihood of oracle action sequences derived from ASTs parsed from code; inference uses beam search over action sequences to yield the best AST.

실험 결과

연구 질문

  • RQ1Can a syntax-informed neural model constrain code generation to well-formed programs by generating via an AST with explicit grammar rules?
  • RQ2Does incorporating AST structure (parent/sibling information) improve accuracy and robustness in general-purpose code generation from NL descriptions?
  • RQ3How does the grammar-based approach compare to prior Seq2Tree and LPN methods on real Python code generation benchmarks?
  • RQ4Is copying from input descriptions essential for handling variable names and literals in code generation?

주요 결과

  • The proposed syntax-driven model achieves 11.7% and 9.3% absolute accuracy improvements over the Latent Predictor Network baseline on HS and Django datasets, respectively.
  • Modeling grammar and AST structure yields grammatically correct outputs and reduces invalid ASTs compared to Seq2Tree baselines.
  • In HS, the model generates about 170 actions per example (vs. 300+ for Seq2Tree), highlighting efficiency gains from applying full grammar rules.
  • Parent feeding significantly boosts HS performance, indicating larger ASTs benefit from hierarchical information flow.
  • Copying terminals from the input (pointer network) is crucial for handling variable names and literals, with removal causing notable drops in accuracy.
  • For IFTTT (domain-specific language), the model is competitive with neural baselines and closer to classic methods on full parse-tree accuracy, demonstrating broader applicability.
  • Ablation shows frontier embeddings contribute less on smaller grammars but help on larger ones, while unary closure reduces action counts and can be beneficial depending on dataset characteristics.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.