QUICK REVIEW

[论文解读] Tolerance Principle and Small Language Model Learning

Adam E. Friedman, Stevan Harnad|arXiv (Cornell University)|Jan 17, 2026

Language Development and Disorders被引用 0

一句话总结

该研究测试杨氏容忍原则是否支配在小型变换器模型 BabyBERTa（以人工语法为训练数据）上的学习，并发现学习动态与该原则不一致。

ABSTRACT

Modern language models like GPT-3, BERT, and LLaMA require massive training data, yet with sufficient training they reliably learn to distinguish grammatical from ungrammatical sentences. Children aged as young as 14 months already have the capacity to learn abstract grammar rules from very few exemplars, even in the presence of non-rule-following exceptions. Yang's (2016) Tolerance Principle defines a precise threshold for how many exceptions a rule can tolerate and still be learnable. The present study explored the minimal amount and quality of training data necessary for rules to be generalized by a transformer-based language model to test the predictions of the Tolerance Principle. We trained BabyBERTa (Huebner et al. 2021), a transformer model optimized for small datasets, on artificial grammars. The training sets varied in size, number of unique sentence types, and proportion of rule-following versus exception exemplars. We found that, unlike human infants, BabyBERTa's learning dynamics do not align with the Tolerance Principle.

研究动机与目标

将人类婴儿语言学习与小模型学习之间的比较动机化。
探究变换器在何种最小数据条件下能泛化语法规则。
测试训练集大小、句型多样性以及规则/例外混合对小型语言模型可学习性的影响。
评估容忍原则是否能预测像 BabyBERTa 这样的小模型的学习结果。

提出的方法

训练 BabyBERTa，这一针对小数据集优化的变换器，在具有不同训练数据的人工语法上进行训练。
系统性地改变训练集大小、独特句型数量以及规则遵循与例外样例的比例。
评估模型是否能将语法规则泛化到训练集之外的新的实例。
将观察到的学习动态与容忍原则预测的阈值进行比较。
分析结果以确定该原则是否像对人类学习者那样适用于小型语言模型。

实验结果

研究问题

RQ1在不同数据条件下，BabyBERTa 对语法规则的泛化是否与杨氏容忍原则一致？
RQ2训练集大小、句型多样性以及规则/例外比率如何影响小型变换器模型的可学习性？
RQ3在暴露于抽象语法规则时，小型语言模型是否表现出与人类婴儿相当的学习动态？

主要发现

BabyBERTa 的学习动态与容忍原则不一致。
模型的表现取决于训练数据的组成，这些影响未被该原则所捕捉。
在可比条件下，有限的小模型数据配置展现出与人类婴儿不同的可学习模式。
结果挑战了容忍原则在小型语言模型中的普适性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。