QUICK REVIEW

[论文解读] Schema-learning and rebinding as mechanisms of in-context learning and emergence

Sivaramakrishnan Swaminathan, Antoine Dedieu|arXiv (Cornell University)|Jun 16, 2023

Topic Modeling被引用 8

一句话总结

论文表明在上下文学习（ICL）中，克隆结构因果图（CSCG）可以实现，并通过模式学习、模板电路、检索与重新绑定来解释ICL机制，主张与变换器的并行性。

ABSTRACT

In-context learning (ICL) is one of the most powerful and most unexpected capabilities to emerge in recent transformer-based large language models (LLMs). Yet the mechanisms that underlie it are poorly understood. In this paper, we demonstrate that comparable ICL capabilities can be acquired by an alternative sequence prediction learning method using clone-structured causal graphs (CSCGs). Moreover, a key property of CSCGs is that, unlike transformer-based LLMs, they are {\em interpretable}, which considerably simplifies the task of explaining how ICL works. Specifically, we show that it uses a combination of (a) learning template (schema) circuits for pattern completion, (b) retrieving relevant templates in a context-sensitive manner, and (c) rebinding of novel tokens to appropriate slots in the templates. We go on to marshall evidence for the hypothesis that similar mechanisms underlie ICL in LLMs. For example, we find that, with CSCGs as with LLMs, different capabilities emerge at different levels of overparameterization, suggesting that overparameterization helps in learning more complex template (schema) circuits. By showing how ICL can be achieved with small models and datasets, we open up a path to novel architectures, and take a vital step towards a more general understanding of the mechanics behind this important capability.

研究动机与目标

解释在非变换器序列模型（CSCG）中如何产生ICL，以及其机制如何与变换器相关联。
展示模板（模式）学习和重新绑定作为核心ICL过程。
表明过参数化、模式形成和上下文检索驱动ICL及在各数据集上的涌现。

提出的方法

引入克隆结构因果图（CSCG）及其发射结构和转移结构。
定义并实现重新绑定以将现有模式映射到新的观测。
提出一种快速重新绑定算法，仅基于预测惊讶来更新发射矩阵。
демонstrаte MAP推理和基于EM的更新以检索并绑定模式以完成任务。
将CSCG机制与变换器中的ICL联系起来并讨论对架构设计的影响。

Figure 1: A . Inducing the structure of the room ( cognitive maps ) from sequential sensory observations is challenging because of perceptual aliasing – local observations do not identify locations uniquely. B . Cloned hidden Markov models (HMMs) [ 7 ] . Each observation is mapped to multiple clone

实验结果

研究问题

RQ1CSCGs是否能够在标准ICL基准测试中达到与LLMs相当的ICL？
RQ2模板电路（模式）和重新绑定如何促进对上下文的依赖性泛化？
RQ3在学习和涌现ICL能力中，过参数化的作用是什么？
RQ4重新绑定如何使将学到的算法快速转移到新令牌和提示上成为可能？
RQ5CSCG机制是否能泛化到通常与变换器ICL相关的任务，如零样本学习和基于指令的检索？

主要发现

CSCGs能够通过上下文相关的潜在表示和传递性泛化再现类似ICL的行为。
学习模板（模式）和上下文敏感检索使提示完成和任务执行更加有效。
对新令牌进行重新绑定到已学槽位使同一模板能够应用于完全新的输入。
过参数化增强了潜在概念的分离并提高了跨任务的ICL性能。
在GINC、LIALT和一个DAX风格的测试上的实验支持所提出的机制，并显示出与模型容量和数据模式相关的涌现。

Figure 2: A . CSCGs allow both separation of contexts and transitive generalization. The word “bank” is wired to different clones that correspond to the different contexts it is used in. If “milk and honey”, and “bread and butter” are seen in training, transitive generalization occurs if they get wi

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。