QUICK REVIEW

[论文解读] Systematicity between Forms and Meanings across Languages Supports Efficient Communication

Doreen Osmelak, Yang Xu|arXiv (Cornell University)|Jan 23, 2026

Language and cultural evolution被引用 0

一句话总结

论文介绍 CETL，一种基于可学性（learnability）的形式-意义系统性复杂度度量，展示经过验证的动词和代词范式通过利用内部形式结构在传统信息瓶颈 IB 模型之上优化效率。

ABSTRACT

Languages vary widely in how meanings map to word forms. These mappings have been found to support efficient communication; however, this theory does not account for systematic relations within word forms. We examine how a restricted set of grammatical meanings (e.g. person, number) are expressed on verbs and pronouns across typologically diverse languages. Consistent with prior work, we find that verb and pronoun forms are shaped by competing communicative pressures for simplicity (minimizing the inventory of grammatical distinctions) and accuracy (enabling recovery of intended meanings). Crucially, our proposed model uses a novel measure of complexity (inverse of simplicity) based on the learnability of meaning-to-form mappings. This innovation captures fine-grained regularities in linguistic form, allowing better discrimination between attested and unattested systems, and establishes a new connection from efficient communication theory to systematicity in natural language.

研究动机与目标

激励/说明语言为何在形式-意义映射中在简洁性与准确性之间取得平衡。
提出一个统一的信息理论框架，将系统性纳入效率模型。
开发一个基于可学性的复杂度度量（CETL），捕捉形式的内部结构。
在跨 typologically 多样的语言中，对动词和代词范式评估 CETL。
将 CETL 与 Information Bottleneck（IB）方法进行比较，并展示更强的判别能力。

提出的方法

将形式 w 建模为字符序列，使用 seq2seq 神经编码器（LSTM）将意义 m_t 映射到表面形式 w。
从语料库频率定义需求分布 p_cog(t)，以通过交际需求对意义目标进行加权。
通过学习过程中的交叉熵衰减来量化复杂度（CETL），跨越 T_max 训练轮次。
用分类型特征表示意义，并通过加权汉明距离 d(u,t) 测量相似性。
使用 IB 框架的贝叶斯解码来评估准确度，并与 CETL 的可学性基准进行比较。
通过结构和仅形式排列生成反事实范式，以测试效率与自然性。

Figure 1: Turkish pronouns show systematic form-meaning mappings: person is consistently marked by prefixes (e.g., s- for second person), number by suffixes. Language evolution research demonstrates that such systematicity supports learnability . Our model connects these findings, proposing that lea

实验结果

研究问题

RQ1经验证的范式的效率（CETL）是否在动词和代词领域优于反事实替代？
RQ2更自然的同形现象模式是否与更低的 CETL（更高的可学性）相关？
RQ3CETL 是否比 IB 模型更好地区分经过验证的系统与反事实？
RQ4内部形式结构（系统性）如何对跨语言的交际效率作出贡献？

主要发现

经验证的范式在动词和代词方面比大多数反事实排列更高效（CETL 更低，准确度更高）。
在 Afro-Asiatic 动词中，CETL 与非自然性之间存在正相关，支持自然性假设（r = 0.5745，p < 2.2e-16）。
CETL 在识别经验证的范式比结构排列更高效方面优于 IB 模型，并与自然性强相关（代词和动词：相关性> 0.8）。
CETL 能通过将形式编码为字符序列来检测跨形式的系统性，捕捉 IB 无法检测到的细粒度规律。
在各领域，经验证的范式在复杂度-准确度权衡方面优于反事实，支持自然语言的以效率为驱动的设计。

Figure 2: Communication model, adapted from Zaslavsky et al. ( 2018 , 2021b ) . Our model encodes the form $w$ as a sequence, and decodes it as an atomic unit.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。