QUICK REVIEW

[论文解读] Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Yifan Hou, Jiaoda Li|arXiv (Cornell University)|Oct 23, 2023

Topic Modeling被引用 1

一句话总结

本文提出 MechanisticProbe，一种基于注意力机制的新型探测方法，通过分析注意力模式来检测语言模型中隐式的推理树。该方法在 GPT-2 和 LLaMA 上成功恢复了合成与自然语言推理任务中的推理结构，表明模型通过机制化过程执行多步推理，而非依赖记忆。

ABSTRACT

Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. Concretely, we hypothesize that the LM implicitly embeds a reasoning tree resembling the correct reasoning process within it. We test this hypothesis by introducing a new probing approach (called MechanisticProbe) that recovers the reasoning tree from the model's attention patterns. We use our probe to analyze two LMs: GPT-2 on a synthetic task (k-th smallest element), and LLaMA on two simple language-based reasoning tasks (ProofWriter & AI2 Reasoning Challenge). We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.

研究动机与目标

探究大型语言模型（LMs）是否通过内部机制化过程执行多步推理，或仅通过回忆记忆答案完成推理。
解决语言模型推理中的模糊性：它们是遵循程序化逻辑，还是依赖预训练中的捷径？
开发一种方法，以机制化方式解释语言模型如何在其注意力机制中编码并执行推理步骤。
验证注意力模式是否反映结构化的推理树，而非仅仅是随机的注意力流动。
证明准确的推理树恢复与模型的鲁棒性及性能提升密切相关。

提出的方法

提出 MechanisticProbe，一种两阶段非参数化探测框架，用于从注意力模式中恢复推理树。
第一阶段：利用注意力模式识别推理树中的有用输入语句（节点）。
第二阶段：从注意力流中推断推理过程的层次结构（树的高度）。
使用简单分类器检测必要的推理节点及其在推理链中的相对位置。
将探测方法应用于 GPT-2 的合成 k-th 最小元素任务，以及 LLaMA 在 ProofWriter 和 ARC 任务上的表现。
通过消融实验（剪枝注意力头）和探测分数与模型鲁棒性之间的相关性分析验证结果。

实验结果

研究问题

RQ1语言模型是否通过内部机制化过程执行多步推理，还是依赖记忆答案？
RQ2语言模型中的注意力模式能否编码出与正确逻辑推导相匹配的结构化推理树？
RQ3模型的推理过程在多大程度上与其预测准确率和鲁棒性相关？
RQ4在推理树恢复中起作用的注意力头是否对正确预测至关重要？
RQ5探测分数能否预测模型在输入扰动下的鲁棒性？

主要发现

MechanisticProbe 在多个任务中成功从 GPT-2 和 LLaMA 的注意力模式中恢复了推理树。
模型在网络底层较早识别出有用的输入语句，支持逐步推理过程的存在。
剪枝由 MechanisticProbe 识别出的注意力头会导致显著的准确率下降，证实其功能重要性。
探测分数较高（表示推理树恢复更佳）的模型对输入噪声表现出更强鲁棒性，当探测分数 SP2 较高时，测试准确率提升约 4%。
探测分数较高的样本表现出更高的预测置信度，并对输入损坏具有更强的容忍度，表明机制化推理可增强模型可靠性。
该方法揭示，语言模型中的注意力机制被结构化设计用于支持程序化推理，而不仅仅是关联性回忆。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。