Skip to main content
QUICK REVIEW

[论文解读] Large Language Models Are Human-Like Internally

Tatsuki Kuribayashi, Yohei Oseki|ArXiv.org|Feb 3, 2025
Topic Modeling被引用 3
一句话总结

本论文表明来自大型语言模型内部层的 surpris al(不仅仅是最后一层)与人类句子处理数据在行为与神经生理测量上对齐;较早的层匹配快速反应,较晚的层与更慢的测量如 N400 和 MAZE 对齐;更大模型在其内部层中包含符合认知的表示。

ABSTRACT

Recent cognitive modeling studies have reported that larger language models (LMs) exhibit a poorer fit to human reading behavior (Oh and Schuler, 2023b; Shain et al., 2024; Kuribayashi et al., 2024), leading to claims of their cognitive implausibility. In this paper, we revisit this argument through the lens of mechanistic interpretability and argue that prior conclusions were skewed by an exclusive focus on the final layers of LMs. Our analysis reveals that next-word probabilities derived from internal layers of larger LMs align with human sentence processing data as well as, or better than, those from smaller LMs. This alignment holds consistently across behavioral (self-paced reading times, gaze durations, MAZE task processing times) and neurophysiological (N400 brain potentials) measures, challenging earlier mixed results and suggesting that the cognitive plausibility of larger LMs has been underestimated. Furthermore, we first identify an intriguing relationship between LM layers and human measures: earlier layers correspond more closely with fast gaze durations, while later layers better align with relatively slower signals such as N400 potentials and MAZE processing times. Our work opens new avenues for interdisciplinary research at the intersection of mechanistic interpretability and cognitive modeling.

研究动机与目标

  • 为认知合理性提供一个分层的 LM 引发 surprisal 的视角,以解释人类句子处理。
  • 检验大型语言模型的内部层是否与人类行为和神经生理数据同样好地对齐,甚至优于较小的语言模型。
  • 研究来自不同 LM 层的 surprisal 如何映射到快速(凝视、初读)与慢速(N400、MAZE)的人类测量。
  • 在考虑内部层时,缩放(模型规模)如何影响认知合理性。
  • 探索跨语言的层级发现的有效性(跨语言实验)。

提出的方法

  • 通过将中间表示投影到输出词汇空间(logit-lens 和 tuned-lens)来计算内部 LM 层的下一个词 surprisal。
  • 使用线性回归将 surprisal(及基线特征)与人类成本(SPR、FPGD、MAZE、N400)相关联,报告 Delta Log-Likelihood(Delta LL)。
  • 对 21 个开源语言模型(6–64 层)在覆盖 SPR、FPGD、MAZE 和 N400 的 15 个人类阅读数据集上进行评估。
  • 评估用于认知合理性的最佳层是最终层还是内部层,在不同数据集和测量上进行比较。
  • 通过包含刺激、模型、镜头类型、layer_depth 和 测量的回归,分析层深度效应及其交互作用。
Figure 1 : Different measures of human sentence processing align with surprisal from different layers of language models (LMs), and the best layer is typically not the final one. Larger LMs can better simulate human reading data with their internal layer than smaller LMs.
Figure 1 : Different measures of human sentence processing align with surprisal from different layers of language models (LMs), and the best layer is typically not the final one. Larger LMs can better simulate human reading data with their internal layer than smaller LMs.

实验结果

研究问题

  • RQ1来自内部 LM 层的 surprisal 是否与人类句子处理数据同样好地对齐,甚至优于来自最后一层的 surprisal?
  • RQ2所选的层(早期 vs. 晚期)如何影响与快速 vs. 慢速人类测量(SPR/FPGD vs. N400/MAZE)的对齐?
  • RQ3当使用内部层时,LM 大小(缩放)如何影响认知合理性?
  • RQ4跨语言验证中,层-测量对齐是否稳定?

主要发现

  • 内部 LM 层在预测人类阅读数据方面往往优于最后一层(Delta LL)。
  • 较早的 LM 层更好地建模快速测量(FPGD 和 SPR),而较晚的层更好地建模慢速测量(N400 和 MAZE)。
  • 在内部层中,评估最佳层的 Delta LL 时,较大模型通常比较小模型显示更高的认知合理性。
  • 存在系统性的层深度与测量类型的交互,支持不同人类测量反映了不同的处理阶段这一观点。
  • 在同一模型家族中,约 80% 的内部层在测试设置中超过了前一个最佳的最后一层结果,表明跨模型的层级合理性具有鲁棒性。
Figure 2 : Relationships between layer depth (x-axis) and $\Delta$ LL (y-axis) for each LM in two datasets: FPGD in DC and SPR in NS. The graphs are separated by model families and data. The best/last layer is indicated with a red/black edge line. The graph starts at the first layer, not at the embe
Figure 2 : Relationships between layer depth (x-axis) and $\Delta$ LL (y-axis) for each LM in two datasets: FPGD in DC and SPR in NS. The graphs are separated by model families and data. The best/last layer is indicated with a red/black edge line. The graph starts at the first layer, not at the embe

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。