QUICK REVIEW

[论文解读] Locating and Editing Factual Associations in GPT

Kevin Meng, David Bau|arXiv (Cornell University)|Feb 10, 2022

Topic Modeling被引用 175

一句话总结

本文显示，GPT 中的事实关联存储在中间层的 MLP 模块中，并演示了 Rank-One Model Editing (ROME) 方法，通过更新中间层 MLP 的单一行来插入新事实，在实现竞争力的编辑性能的同时具有良好的泛化性和特异性。

ABSTRACT

We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/

研究动机与目标

识别在类 GPT 的自回归变换器中，事实关联存储在何处。
开发因果追踪以识别参与事实回忆的决定性激活。
提出 Rank-One Model Editing (ROME)，通过更新 MLP 权重来插入或修改事实关联。
在标准与反事实编辑基准上评估 ROME，以评估泛化性和特异性。
将 ROME 与现有的微调和超网络编辑方法进行比较并分析鲁棒性。

提出的方法

构建因果中介框架以量化隐藏状态对事实预测的间接影响。
识别在最后一个主题标记处介导回忆的中间层 MLP 的决定性激活。
将 MLP 模型为线性联想记忆，并提出 Rank-One Model Editing (ROME)，通过对 MLP 投影矩阵的秩-1 更新来插入新的键–值对。
将 k* 计算为在最后一个标记处基于平均 MLP 激活的键。
通过优化一个向量来最大化目标对象的概率，同时通过 KL 约束最小化本质漂移来计算 v*。
将秩-1 更新应用于 Wproj^(l)，形式为 Wproj_hat = Wproj + Lambda (C^{-1} k*)^T，其中 C = KK^T。

实验结果

研究问题

RQ1能否将 GPT 中间层前馈模块确认为事实回忆的因果位置？
RQ2如何通过编辑内部计算而非权重来直接修改模型存储的事实？
RQ3与现有方法相比，Rank-One Model Editing (ROME) 是否提供有效、可泛化且具特异性的事实关联编辑？
RQ4反事实数据集是否能揭示编辑后泛化性与特异性之间的平衡？
RQ5因果追踪结果是否与跨层和跨标记的成功 ROME 编辑一致？

主要发现

编辑	有效性	释义	特异性
GPT-2 XL	22.2 ± 0.5	21.3 ± 0.5	24.2 ± 0.5
FT	99.6 ± 0.1	82.1 ± 0.6	23.2 ± 0.5
FT+L	92.3 ± 0.4	47.2 ± 0.7	23.4 ± 0.5
KE	65.5 ± 0.6	61.4 ± 0.6	24.9 ± 0.5
KE-zsRE	92.4 ± 0.3	90.0 ± 0.3	23.8 ± 0.5
MEND	75.9 ± 0.5	65.3 ± 0.6	24.1 ± 0.5
MEND-zsRE	99.4 ± 0.1	99.3 ± 0.1	24.1 ± 0.5
ROME	99.8 ± 0.0	88.1 ± 0.5	24.2 ± 0.5

因果追踪揭示在后期层级存在强烈的间接效应，尤其是在最后一个主题标记处的中间层 MLP。
在早期位置 MLP 的贡献占主导，而在提示的最后一个标记处注意力占主导。
ROME 能通过单次秩-1 更新插入新的事实关联，在 zsRE 上的效果相对于微调和超网络基线具有竞争力。
ROME 在 CounterFact 上实现了强泛化性和特异性，超越了 FT、FT+L、KE、MEND 等基线，在平衡这些特性方面表现更好。
编辑在将目标定位于最后一个主题标记处的中间层 MLP 时最为有效，泛化在 GPT-2-XL 的第 18 层达到峰值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。