QUICK REVIEW

[论文解读] In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage|arXiv (Cornell University)|Sep 24, 2022

Domain Adaptation and Few-Shot Learning被引用 84

一句话总结

本文提出，induction heads 是 transformers 中 in-context learning 的机制来源，在小模型中提供因果证据，在更大模型中提供相关证据，横跨六条互补证据。

ABSTRACT

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

研究动机与目标

研究 induction heads 是否实现一个简单算法，该算法完成类似 [A][B] ... [A] -> [B] 的标记序列。
考察 induction heads 是否是各种规模的变换器模型中 in-context learning 的主要机制。
提供多条证据来建立 induction heads 与 in-context learning 表现之间的因果或相关联系。

提出的方法

将 induction heads 识别为 in-context learning 的候选机制。
提出六条互补证据将 induction heads 与 in-context learning 联系起来。
对于小型仅有注意力机制的模型，提供证据表明 induction heads 驱动学习现象具有因果性。
对于包含 MLP 的较大模型，提供支持该联系的相关证据。
显示 induction heads 的发展发生在与 in-context learning 能力显著提升相同的阶段，这一阶段表现为损失峰值。
综合上述发现，主张 induction heads 是通用 in-context learning 的机制来源。

实验结果

研究问题

RQ1induction heads 是否实现了变换器中 in-context learning 背后的核心算法？
RQ2induction heads 是否在小模型中对观察到的 in-context learning 具有因果负责，在较大模型中与之相关联？
RQ3induction heads 是否在 in-context learning 能力突然提升的同一发展阶段出现？
RQ4六条证据是否在模型规模上连贯地支持 induction heads 的机制作用？

主要发现

在训练损失出现凸起时，induction heads 与 in-context learning 的突发改进相关。
在小型仅含注意力的模型中，induction heads 提供强烈的因果证据，表明其驱动 in-context learning。
在包含 MLP 的较大模型中，证据为相关性，但始终与 induction head 机制一致。
induction head 发育的时机与增强的 in-context learning 能力的出现一致。
六条互补证据共同支持 induction heads 作为跨变换器尺寸的通用 in-context learning 机制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。