QUICK REVIEW

[论文解读] Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen|arXiv (Cornell University)|Oct 3, 2023

Natural Language Processing Techniques被引用 14

一句话总结

类比提示促使 LLMs 自主生成定制化的范例和知识，引导推理，在没有带标签数据的情况下提升性能，覆盖数学、代码和 BIG-Bench 任务。

ABSTRACT

Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.

研究动机与目标

Motivate reducing reliance on manually labeled reasoning exemplars in chain-of-thought prompting.
Propose analogical prompting where the model recalls and generates relevant exemplars and knowledge in-context.
Demonstrate that self-generated exemplars and knowledge improve performance across math, code, and BIG-Bench tasks.

提出的方法

Introduce self-generated exemplars: prompt the LLM to recall and generate multiple relevant problem–solution exemplars in one pass before solving the target problem.
Extend with self-generated knowledge: optionally generate high-level tutorials to accompany exemplars, improving generalization for complex tasks.
Explore single-pass prompting to produce knowledge, exemplars, and solution end-to-end.
Experiment with multiple base LLMs (GPT-3.5-turbo, GPT-4, PaLM 2) across GSM8K, MATH, Codeforces, and BIG-Bench.
Compare against 0-shot CoT, 5-shot CoT, and retrieval-based CoT to assess effectiveness of self-generation.
Analyze the impact of the number of exemplars (K) and the ordering of knowledge before exemplars on performance.

实验结果

研究问题

RQ1Can self-generated exemplars replace manually labeled exemplars in CoT prompting across diverse reasoning tasks?
RQ2Does adding self-generated high-level knowledge alongside exemplars improve problem solving, especially for complex tasks like code generation?
RQ3How does the approach scale with model size and across different base LLMs?
RQ4What are the trade-offs between self-generation and retrieval of exemplars in terms of reliability and performance?

主要发现

Self-generated exemplars improve GSM8K and MATH accuracy beyond 0-shot and standard few-shot CoT.
Self-generated knowledge plus exemplars yields additional gains on Codeforces tasks, highlighting the benefit of high-level takeaways.
Across BIG-Bench tasks, self-generated exemplars outperform 0-shot CoT and are competitive with manual 3-shot CoT.
The approach scales with larger LLMs and tends to outperform retrieval-based CoT for bigger models.
Increasing the number of exemplars to 3–5 generally stabilizes and improves performance.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。