Skip to main content
QUICK REVIEW

[论文解读] Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen|arXiv (Cornell University)|Oct 3, 2023
Natural Language Processing Techniques被引用 14
一句话总结

类比提示促使 LLMs 自主生成定制化的范例和知识,引导推理,在没有带标签数据的情况下提升性能,覆盖数学、代码和 BIG-Bench 任务。

ABSTRACT

Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.

研究动机与目标

  • Motivate reducing reliance on manually labeled reasoning exemplars in chain-of-thought prompting.
  • Propose analogical prompting where the model recalls and generates relevant exemplars and knowledge in-context.
  • Demonstrate that self-generated exemplars and knowledge improve performance across math, code, and BIG-Bench tasks.

提出的方法

  • Introduce self-generated exemplars: prompt the LLM to recall and generate multiple relevant problem–solution exemplars in one pass before solving the target problem.
  • Extend with self-generated knowledge: optionally generate high-level tutorials to accompany exemplars, improving generalization for complex tasks.
  • Explore single-pass prompting to produce knowledge, exemplars, and solution end-to-end.
  • Experiment with multiple base LLMs (GPT-3.5-turbo, GPT-4, PaLM 2) across GSM8K, MATH, Codeforces, and BIG-Bench.
  • Compare against 0-shot CoT, 5-shot CoT, and retrieval-based CoT to assess effectiveness of self-generation.
  • Analyze the impact of the number of exemplars (K) and the ordering of knowledge before exemplars on performance.

实验结果

研究问题

  • RQ1Can self-generated exemplars replace manually labeled exemplars in CoT prompting across diverse reasoning tasks?
  • RQ2Does adding self-generated high-level knowledge alongside exemplars improve problem solving, especially for complex tasks like code generation?
  • RQ3How does the approach scale with model size and across different base LLMs?
  • RQ4What are the trade-offs between self-generation and retrieval of exemplars in terms of reliability and performance?

主要发现

  • Self-generated exemplars improve GSM8K and MATH accuracy beyond 0-shot and standard few-shot CoT.
  • Self-generated knowledge plus exemplars yields additional gains on Codeforces tasks, highlighting the benefit of high-level takeaways.
  • Across BIG-Bench tasks, self-generated exemplars outperform 0-shot CoT and are competitive with manual 3-shot CoT.
  • The approach scales with larger LLMs and tends to outperform retrieval-based CoT for bigger models.
  • Increasing the number of exemplars to 3–5 generally stabilizes and improves performance.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。