QUICK REVIEW

[论文解读] Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Lee|arXiv (Cornell University)|Mar 21, 2022

Topic Modeling被引用 674

一句话总结

本文介绍 self-consistency，一种解码策略，它对多样的 chain-of-thought 路径进行采样并聚合最具一致性的最终答案，在算术和常识任务上显著提升推理准确性且无需额外训练。

ABSTRACT

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

研究动机与目标

推动改进大语言模型的推理能力，超越标准的 chain-of-thought 提示。
提出一种解码方法，生成多样化的推理路径以识别最可靠的答案。
展示在多种模型和推理基准上的鲁棒性与性能提升。
证明该方法不需要额外的监督或微调。
探讨从模型输出中进行不确定性估计和推理理由收集的潜力。

提出的方法

像先前的 CoT 提示一样，用 chain-of-thought 示例引导模型。
使用温度、top-k、核采样等策略从模型的解码器中抽样多样化的推理路径。
通过对抽样的推理路径进行边缘化，聚合最终答案以选取最一致的一个（多数投票或加权聚合）。
将每个抽样路径视为将推理步骤与最终答案连接起来的潜在变量，而无需训练任何辅助模型。
比较聚合策略（多数投票与加权求和），并显示最一致的答案带来更好的性能。
证明 self-consistency 是无监督、与模型无关的，并且不需要微调或额外的标注。

实验结果

研究问题

RQ1通过采样引入推理路径的多样性，是否能超越贪心链式推理解码提高最终答案的准确性？
RQ2在多条抽样路径之间应如何聚合最终答案以最大化正确性？
RQ3self-consistency 方法是否对模型规模、提示策略和采样参数具有鲁棒性？
RQ4self-consistency 是否能提供不确定性估计，或在提示不完善的情景中提供帮助？
RQ5与 sample-and-rank、束搜索以及传统集成相比，self-consistency 如何？

主要发现

self-consistency 在算术和常识任务中显著优于标准的 chain-of-thought 提示，提升准确性。
随着模型规模增大（如 LaMDA-137B、PaLM-540B、GPT-3），收益更大，并在若干基准测试上达到新的最先进水平。
在 GSM8K、SVAMP、AQuA、StrategyQA 以及 ARC-challenge 等任务上，报告的改进分别为最高 +17.9%、+11.0%、+12.2%、+6.4% 和 +3.9% 。
通过对大量采样路径进行归一化加权和或多数投票的聚合，优于无权重的方法和单路径贪心解码。
self-consistency 对采样策略和模型提示具有鲁棒性，即使 chain-of-thought 促使标准提示走向负面，也能提升性能。
与 sample-and-rank、束搜索和集成方法相比，self-consistency 在单模型且不需要额外训练的情况下提供更大的提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。