Skip to main content
QUICK REVIEW

[论文解读] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

Huaixiu Zheng, Swaroop Mishra|arXiv (Cornell University)|Oct 9, 2023
Topic Modeling被引用 22
一句话总结

Step-Back Prompting uses abstraction to derive high-level concepts and first principles, grounding reasoning to improve multi-task performance in STEM, Knowledge QA, and multi-hop reasoning, achieving notable gains over baselines and CoT prompting.

ABSTRACT

We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L, GPT-4 and Llama2-70B models, and observe substantial performance gains on various challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU (Physics and Chemistry) by 7% and 11% respectively, TimeQA by 27%, and MuSiQue by 7%.

研究动机与目标

  • 激励大型语言模型在复杂、细节丰富的推理方面的挑战。
  • 提出 Step-Back Prompting,在解决问题之前推导高层抽象概念。
  • 证明基于抽象导向的推理能够提升在 STEM、知识问答和多跳任务中的准确性。

提出的方法

  • Two-step Abstraction-and-Reasoning:先推导高层概念或原理,然后在这些抽象之上进行求解。
  • 使用少样本演示来教导大模型抽象步骤。
  • 在适当时利用检索增强(RAG)将高层概念与支持事实联系起来。
  • 在多样数据集(STEM、知识问答、多跳)上使用贪婪解码和基于分数的预测评估进行评估。
  • 与基线如 PaLM-2L 和 GPT-4 比较,结合 CoT、TDB 及 RAG 视情况适用。
Figure 1: Strong Performance of Step-Back Prompting : our proposed Abstraction-and-Reasoning scheme leads to a substantial improvement in a wide range of challenging tasks in STEM, Knowledge QA and Multi-Hop Reasoning requiring complex (often multi-hop) reasoning.
Figure 1: Strong Performance of Step-Back Prompting : our proposed Abstraction-and-Reasoning scheme leads to a substantial improvement in a wide range of challenging tasks in STEM, Knowledge QA and Multi-Hop Reasoning requiring complex (often multi-hop) reasoning.

实验结果

研究问题

  • RQ1与标准提示和链式推理提示相比,Step-Back Prompting 是否在 STEM、知识问答和多跳任务上提高了准确性?
  • RQ2抽象步骤对演示次数的鲁棒性如何?
  • RQ3哪些类型的错误最受影响于基于抽象的推理(例如推理、数学或上下文丢失)?

主要发现

  • Step-Back Prompting 将 PaLM-2L 在 MMLU Physics 上的表现提升了 7 个百分点,在 MMLU Chemistry 上提升了 11 个百分点,在这两个子任务上都优于 GPT-4。
  • Step-Back Prompting 相对于 PaLM-2L 基线在 TimeQA 上提升 27%,在 MuSiQue 上提升 7%。
  • 在多项任务中,Step-Back 在某些分析中比 CoT 和 TDB 高出多达 36%,并修正了高达约 40% 的基线模型错误,同时产生的新错误比例较小,约 12%。
  • 消融分析显示 Step-Back 对少样本演示次数具有鲁棒性,通常只需一个示例即可教授抽象。
  • 错误分析表明,剩余错误大多发生在 Reasoning 步骤,而非 Abstraction 步骤,凸显推理是主要瓶颈。
Figure 2: Illustration of Step-Back Prompting with two steps of Abstraction and Reasoning guided by concepts and principles. Top : an example of MMLU high-school physics (Hendrycks et al., 2020 ) where the first principle of Ideal Gas Law is retrieved via abstraction. Bottom : an example from TimeQA
Figure 2: Illustration of Step-Back Prompting with two steps of Abstraction and Reasoning guided by concepts and principles. Top : an example of MMLU high-school physics (Hendrycks et al., 2020 ) where the first principle of Ideal Gas Law is retrieved via abstraction. Bottom : an example from TimeQA

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。