QUICK REVIEW

[论文解读] A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

Alessandro Stolfo, Yonatan Belinkov|arXiv (Cornell University)|May 24, 2023

Topic Modeling被引用 2

一句话总结

该论文使用因果中介分析追踪基于Transformer的大型语言模型处理算术推理的过程，揭示注意力机制会将操作数和操作符信息从中间序列的早期层传递到最终标记，而后期的MLP模块则生成与结果相关的表征。该研究识别出一套独特且任务特定的算术处理电路，其与数字检索或事实知识任务所使用的电路不同。

ABSTRACT

Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. By intervening on the activations of specific model components and measuring the resulting changes in predicted probabilities, we identify the subset of parameters responsible for specific predictions. This provides insights into how information related to arithmetic is processed by LMs. Our experimental results indicate that LMs process the input by transmitting the information relevant to the query from mid-sequence early layers to the final token using the attention mechanism. Then, this information is processed by a set of MLP modules, which generate result-related information that is incorporated into the residual stream. To assess the specificity of the observed activation dynamics, we compare the effects of different model components on arithmetic queries with other tasks, including number retrieval from prompts and factual knowledge questions.

研究动机与目标

理解大型语言模型执行算术推理的内部机制。
通过因果干预识别负责算术预测的具体模型组件。
评估在算术推理中观察到的激活动态是否与其它数值或事实任务具有独特性。
提供关于信息在算术推理过程中如何通过模型架构流动的机制性见解。
通过识别关键计算电路，为未来模型可解释性、剪枝和推理时修正工作提供支持。

提出的方法

通过干预其激活值，对模型组件（神经元、层、注意力头、MLP）应用因果中介分析。
对特定模型参数子集进行受控干预，并测量输出概率分布的变化。
追踪信息从输入标记经由注意力机制到最终标记表征的流动过程。
通过基于干预的因果效应，识别显著影响预测结果的中介（模型组件）。
对比四种任务的激活动态：算术（使用阿拉伯数字和词形表示）、数字检索和事实知识。
使用按干预效应排序的前400个神经元计算任务间的神经元重叠，并通过随机基线进行统计验证。

实验结果

研究问题

RQ1哪些模型组件对大型语言模型中正确的算术预测具有因果责任？
RQ2在算术推理过程中，与操作数和操作符相关的信息如何通过模型的各层和注意力机制流动？
RQ3算术推理所用的电路是否与数字检索或事实知识任务所用电路不同？
RQ4模型是否依赖一组特定的后期MLP模块来生成与结果相关的表征？
RQ5算术推理中的激活动态与其它数值或事实预测任务中的动态相比如何？

主要发现

操作数和操作符的信息通过自注意力机制，从中间序列的早期层传递到最终标记。
后期层的MLP模块负责生成将被整合到残差流中的与结果相关的表征。
使用阿拉伯数字和词形表示的算术查询激活的前400个神经元中，有50%的重叠，表明存在共享的电路。
算术与数字检索任务中活跃神经元的重叠仅为22–23%，表明尽管两者均涉及数值预测，但电路截然不同。
算术与事实知识任务之间神经元重叠为9–10%，与随机基线无统计学差异，证实了电路的特异性。
观察到的激活动态具有算术推理特异性，无法推广至其他数值或事实任务。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。