Skip to main content
QUICK REVIEW

[论文解读] The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Ziqian Zhong, Ziming Liu|arXiv (Cornell University)|Jun 30, 2023
Neural Networks and Applications被引用 11
一句话总结

本文表明在模数加法上训练的神经网络能够根据结构和超参数发现多种算法策略(Clock、Pizza 等等),揭示神经网络在机械解法中的算法相变。

ABSTRACT

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.

研究动机与目标

  • 阐明神经网络在超越单一规范解的算法任务中发现多种算法的能力。
  • 表明在相似的架构中,不同超参数下 Clock 和 Pizza 算法都可能出现。
  • 证明网络可以并行集成多种算法变体以增强鲁棒性。
  • 引入用于区分这些算法并量化算法空间中相变的指标。

提出的方法

  • 在模 p 的模加法上训练单层变换器,带注意力与不带注意力,p=59。
  • 将学习到的嵌入在 PCA 投影的空间中表征为圆形,以识别 Clock 行为。
  • 将梯度对称性和距离无关性定义并计算为区分 Clock 与 Pizza 的指标。
  • 引入圆形孤立性以在降维子空间下分析嵌入表示。
  • 改变架构并引入一个新的注意力速率参数,以绘制 Clock 与 Pizza 之间的算法相变。
Figure 1: Illustration of the Clock and the Pizza Algorithm.
Figure 1: Illustration of the Clock and the Pizza Algorithm.

实验结果

研究问题

  • RQ1在模加法上训练的神经网络能否重新发现熟悉的算法,如 Clock,还是在不同条件下会出现替代策略?
  • RQ2在实际中,哪些机制(嵌入、梯度)区分 Clock 与 Pizza?
  • RQ3架构(有/无注意力)和超参数如何影响学习到的算法?
  • RQ4网络是否会将多种算法策略并行集成,如何检测和分析?

主要发现

  • Clock 和 Pizza 都是在类似网络中可行的模加法解。
  • 无注意力的网络(倾向 Clock)显示梯度对称性和距离无关的对数输出模式,表明具有 Pizza 风格的行为。
  • Pizza 算法依赖对嵌入求平均和绝对值运算,导致对数输出模式依赖于 a-b。
  • Clock 算法使用圆形嵌入且不依赖于 a-b;Pizza 显示对 a-b 的依赖,并且对数输出中还有额外的 |cos((a-b)/2)| 因子。
  • Clock 与 Pizza 之间存在由模型宽度和注意力强度支配的尖锐算法相变,集成显示出对输入的鲁棒性。
Figure 2: Gradients on first six principal components of input embeddings. $(a,b,c)$ in the title stands for taking gradients on the output logit $c$ for input $(a,b)$ . x and y axes represent the gradients for embeddings of the first and the second token. The dashed line $y=x$ signals a symmetric g
Figure 2: Gradients on first six principal components of input embeddings. $(a,b,c)$ in the title stands for taking gradients on the output logit $c$ for input $(a,b)$ . x and y axes represent the gradients for embeddings of the first and the second token. The dashed line $y=x$ signals a symmetric g

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。