[论文解读] Large Language Models as Analogical Reasoners
类比提示促使 LLMs 自主生成定制化的范例和知识,引导推理,在没有带标签数据的情况下提升性能,覆盖数学、代码和 BIG-Bench 任务。
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.
研究动机与目标
- Motivate reducing reliance on manually labeled reasoning exemplars in chain-of-thought prompting.
- Propose analogical prompting where the model recalls and generates relevant exemplars and knowledge in-context.
- Demonstrate that self-generated exemplars and knowledge improve performance across math, code, and BIG-Bench tasks.
提出的方法
- Introduce self-generated exemplars: prompt the LLM to recall and generate multiple relevant problem–solution exemplars in one pass before solving the target problem.
- Extend with self-generated knowledge: optionally generate high-level tutorials to accompany exemplars, improving generalization for complex tasks.
- Explore single-pass prompting to produce knowledge, exemplars, and solution end-to-end.
- Experiment with multiple base LLMs (GPT-3.5-turbo, GPT-4, PaLM 2) across GSM8K, MATH, Codeforces, and BIG-Bench.
- Compare against 0-shot CoT, 5-shot CoT, and retrieval-based CoT to assess effectiveness of self-generation.
- Analyze the impact of the number of exemplars (K) and the ordering of knowledge before exemplars on performance.
实验结果
研究问题
- RQ1Can self-generated exemplars replace manually labeled exemplars in CoT prompting across diverse reasoning tasks?
- RQ2Does adding self-generated high-level knowledge alongside exemplars improve problem solving, especially for complex tasks like code generation?
- RQ3How does the approach scale with model size and across different base LLMs?
- RQ4What are the trade-offs between self-generation and retrieval of exemplars in terms of reliability and performance?
主要发现
- Self-generated exemplars improve GSM8K and MATH accuracy beyond 0-shot and standard few-shot CoT.
- Self-generated knowledge plus exemplars yields additional gains on Codeforces tasks, highlighting the benefit of high-level takeaways.
- Across BIG-Bench tasks, self-generated exemplars outperform 0-shot CoT and are competitive with manual 3-shot CoT.
- The approach scales with larger LLMs and tends to outperform retrieval-based CoT for bigger models.
- Increasing the number of exemplars to 3–5 generally stabilizes and improves performance.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。