QUICK REVIEW

[论文解读] Complexity-Based Prompting for Multi-Step Reasoning

Yao Fu, Hao Peng|arXiv (Cornell University)|Oct 3, 2022

Topic Modeling被引用 73

一句话总结

本文：本文提出基于复杂度的提示和基于复杂度的一致性来选择更复杂的推理链用于提示和解码，在GPT-3和Codex的多步推理基准上取得了新的最先进结果。

ABSTRACT

We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make the most effective prompts. In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. We show that prompts with higher reasoning complexity, i.e., chains with more reasoning steps, achieve substantially better performance on multi-step reasoning tasks over strong baselines. We further extend our complexity-based criteria from prompting (selecting inputs) to decoding (selecting outputs), where we sample multiple reasoning chains from the model, then choose the majority of generated answers from complex reasoning chains (over simple chains). When used to prompt GPT-3 and Codex, our approach substantially improves multi-step reasoning accuracy and achieves new state-of-the-art (SOTA) performance on three math benchmarks (GSM8K, MultiArith, and MathQA) and two BigBenchHard tasks (Date Understanding and Penguins), with an average +5.3 and up to +18 accuracy improvements. Compared with existing example selection schemes like manual tuning or retrieval-based selection, selection based on reasoning complexity is intuitive, easy to implement, and annotation-efficient. Further results demonstrate the robustness of performance gains from complex prompts under format perturbation and distribution shift.

研究动机与目标

通过使用更复杂的推理链来促使改进的多步推理。
提出一种简单且标注高效的方法来选择复杂的提示示例。
将提示扩展到解码阶段，通过在复杂推理链之间投票（基于复杂度的一致性）来实现。
展示在多个数据集和模型类型上稳健的性能提升。

提出的方法

将复杂样本定义为在思维链（CoT）中包含更多推理步骤的样本。
将基于复杂度的提示与手工设计的和随机的 CoT 提示在 GPT-3 和 Codex 上进行比较。
将解码扩展为在前K条复杂链之间投票（基于复杂度的一致性），而非对所有链进行投票。
在 GSM8K、MultiArith、MathQA、Date Understanding、Penguins 和 StrategyQA 上进行评估。
展示对提示分布和扰动的鲁棒性，并分析混淆因素。
与其他示例选择方案（随机、质心、检索）进行比较。

实验结果

研究问题

RQ1使用更复杂的推理链的提示相对于简单提示是否能提高多步推理的准确性？
RQ2在解码阶段从最复杂的链中选择输出是否比对所有链进行投票得到更好的结果？
RQ3基于复杂度的提示所获得的提升是否对分布漂移、提示扰动和各种复杂度代理具有鲁棒性？
RQ4在不同数据集上，基于复杂度的提示与现有的示例选择方法（随机、质心、检索）相比如何？
RQ5这一改进是否仅在非常大的模型中才是涌现能力？

主要发现

复杂提示在 GPT-3 和 Codex 上显著比手工设计或随机的 CoT 提示具有更高的准确性。
在前K条复杂链中的投票（基于复杂度的一致性）优于对所有链和简单链的投票。
在 GSM8K、MultiArith 和 MathQA 上实现新的最先进性能，对 Date Understanding 和 Penguins 也取得亮眼结果，平均增益为 +5.3（GPT-3）和 +6.2（Codex）。
增益在提示分布（同分布、噪声和分布漂移）以及提示格式扰动下持续存在。
复杂提示相较于基于检索或全训练集方法显示出鲁棒性和标注效率，复杂度作为可靠的复杂度代理（例如问题长度、公式长度）。
基于复杂度的提示是非常大模型的涌现能力，并相对于基线 CoT 提示带来显著提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。