Skip to main content
QUICK REVIEW

[Paper Review] Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

Zhihong Shao, Yeyun Gong|arXiv (Cornell University)|Feb 1, 2023
Topic Modeling10 citations
TL;DR

Synthetic prompting uses a few seed demonstrations to generate many self-synthesized chain-of-thought examples via backward-forward planning, then selects diverse, complex demonstrations to improve LLM reasoning, achieving up to 15.6% absolute gains over state-of-the-art methods.

ABSTRACT

Large language models can perform various reasoning tasks by using chain-of-thought prompting, which guides them to find answers through step-by-step demonstrations. However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly. We introduce Synthetic prompting, a method that leverages a few handcrafted examples to prompt the model to generate more examples by itself, and selects effective demonstrations to elicit better reasoning. Our method alternates between a backward and forward process to generate new examples. The backward process generates a question that match a sampled reasoning chain, so that the question is solvable and clear. The forward process produces a more detailed reasoning chain for the question, improving the quality of the example. We evaluate our method on numerical, symbolic, and algorithmic reasoning tasks, and show that it outperforms existing prompting techniques.

Motivation & Objective

  • Motivate reducing demonstration curation cost by generating additional examples automatically.
  • Develop a backward-forward synthesis loop where the model creates questions and detailed reasoning to enrich demonstrations.
  • Propose an in-cluster complexity based selection to pick diverse, informative demonstrations for inference.
  • Demonstrate effectiveness across numerical, symbolic, and algorithmic reasoning benchmarks.

Proposed method

  • Use seed demonstrations to prompt an LLM to perform backward synthesis (generate a question conditioned on a topic word, target complexity, and a reasoning chain) and forward synthesis (generate a refined reasoning chain for the synthesized question).
  • Implement a stopping criterion and quality filters (e.g., deduplicate, ensure topic coverage, ensure solvability).
  • Cluster synthesized demonstrations in a semantic space (Sentence-BERT) and select the most complex example from each cluster for inference.
  • Adopt PaL-style reasoning chains (structured code) for synthesis, with answers obtained by executing the code rather than extracting from model outputs.
  • Measure answer confidence via sampling multiple reasoning chains and majority voting to filter synthetic questions (used only in synthesis, not inference)
  • Evaluate on numerical, symbolic, and algorithmic tasks and compare against direct prompting, CoT prompting, and PaL prompting.

Experimental results

Research questions

  • RQ1Can self-synthesized demonstrations, derived from a few seed examples, improve LLM reasoning compared to using seed examples alone?
  • RQ2Does an in-cluster complexity based selection yield more diverse and informative demonstrations for inference?
  • RQ3How does synthetic prompting perform across numerical, symbolic, and algorithmic reasoning tasks relative to state-of-the-art prompting methods?

Key findings

  • Synthetic prompting yields up to 15.6% absolute gains over state-of-the-art PaL prompting on several datasets.
  • Vanilla synthetic prompting often underperforms PaL prompting due to lack of complexity control and diversity; conditioning in synthesis improves results.
  • The in-cluster complexity based selection consistently outperforms other schemes, highlighting the value of diversity plus high reasoning complexity.
  • Synthesized demonstrations are generally more complex and on-topic than vanilla synthetic prompts, and selected demonstrations are mostly correct and informative.
  • Compared to using carefully selected gold demonstrations from training data, synthetic demonstrations can approach or exceed performance when limited seed examples are available.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.