QUICK REVIEW

[论文解读] Testing the Predictions of Surprisal Theory in 11 Languages

Ethan Wilcox, Tiago Pimentel|arXiv (Cornell University)|Jan 1, 2023

Neurobiology of Language and Bilingualism被引用 5

一句话总结

本研究基于来自五个语系的11种语言的眼动追踪数据与语言模型，测试了意外性理论。研究评估了词意外性、语境熵以及线性关联函数是否能预测阅读时间。结果发现，所有三项预测在跨语言层面上均得到强有力支持，为信息论与增量语言处理之间的关联提供了迄今为止最全面的证据。

ABSTRACT

A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, i.e. its negative log-probability given a context. While evidence supporting the predictions of Surprisal Theory have been replicated widely, most have focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times; (ii) whether expected surprisal, i.e. contextual entropy, is predictive of reading times; (iii) and whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.

研究动机与目标

测试意外性理论是否能推广至英语以外的多种语言。
评估语境熵（预期意外性）是否能预测阅读时间。
确定意外性与阅读时间之间的关系是否为线性。
比较单语与多语语言模型在预测心理语言学阅读时间方面的表现。
提供迄今为止最全面的跨语言意外性理论评估。

提出的方法

使用MECO数据集，该数据集提供了11种语言中内容完全相同的平行文本的眼动追踪数据。
在大规模和小规模语料库（每种语言约3000万词元）上，分别训练了单语和多语自回归语言模型（mGPT及较小的单语模型）。
将词级意外性计算为给定左文条件下的负对数似然，语境熵则计算为所有可能延续的预期意外性。
应用线性回归模型预测逐词阅读时间，以意外性和语境熵作为预测变量，并通过对数似然评估模型改进程度。
比较意外性与阅读时间之间采用线性与非线性关联函数的模型表现。
使用统计显著性检验评估引入意外性和熵是否显著提升了基线模型的预测能力。

实验结果

研究问题

RQ1意外性是否能在多种语言中预测阅读时间？
RQ2语境熵（预期意外性）是否同样能预测阅读时间？
RQ3意外性与阅读时间之间的关联函数是否为线性？
RQ4单语与多语语言模型在预测阅读时间方面表现如何比较？
RQ5研究结果在印欧语系以外的语言中是否具有可推广性？

主要发现

在所有11种测试语言中，意外性均显著预测阅读时间，证实了意外性假说。
在大多数语言中，语境熵比意外性对阅读时间的预测能力更强，支持了语境熵假说。
采用线性关联函数的模型在预测性能上与更复杂的模型相当，支持了线性关联假说。
多语模型（mGPT）在所有语言中的阅读时间预测表现与单语模型相当。
结果表现出一致且跨语言稳定的效应，表明意外性理论在不同语系间具有稳健的泛化能力。
本研究为意外性理论提供了迄今为止最全面的跨语言验证，支持证据在多种语言类型中均表现强劲。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。