Skip to main content
QUICK REVIEW

[论文解读] Zipf's law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation

Shuiyuan Yu, Chunshan Xu|arXiv (Cornell University)|Jul 5, 2018
Language and cultural evolution参考文献 33被引用 109
一句话总结

本研究分析了50种语言的Zipf定律,揭示一种普遍的三段模式,在下段存在向下偏离,并通过仿真将其与双过程认知机制相关联。

ABSTRACT

Zipf's law has been found in many human-related fields, including language, where the frequency of a word is persistently found as a power law function of its frequency rank, known as Zipf's law. However, there is much dispute whether it is a universal law or a statistical artifact, and little is known about what mechanisms may have shaped it. To answer these questions, this study conducted a large scale cross language investigation into Zipf's law. The statistical results show that Zipf's laws in 50 languages all share a 3-segment structural pattern, with each segment demonstrating distinctive linguistic properties and the lower segment invariably bending downwards to deviate from theoretical expectation. This finding indicates that this deviation is a fundamental and universal feature of word frequency distributions in natural languages, not the statistical error of low frequency words. A computer simulation based on the dual-process theory yields Zipf's law with the same structural pattern, suggesting that Zipf's law of natural languages are motivated by common cognitive mechanisms. These results show that Zipf's law in languages is motivated by cognitive mechanisms like dual-processing that govern human verbal behaviors.

研究动机与目标

  • 研究 Zipf 定律是否在自然语言中普遍存在,并表征其结构模式。
  • 识别与 Zipf 分布各段相关的语言属性。
  • 检验较低频段的偏离是否具有系统性并在各语言中普遍存在。
  • 提出一种可能在语言使用中生成 Zipf 定律的认知机制。

提出的方法

  • 对50种语言的词频分布进行经验分析以检验Zipf定律。
  • 识别分布中的三段结构。
  • 对较低段相对于理论预期的偏离进行统计表征。
  • 基于双过程理论的计算机仿真以再现观察到的模式。
  • 将结果解读为支配语言行为的认知机制。

实验结果

研究问题

  • RQ1Zipf 定律是否在50种语言中普遍存在以及其结构模式为何?
  • RQ2哪些语言属性表征词频分布的每一段?
  • RQ3较低频段的向下偏离是否普遍且具有统计显著性?
  • RQ4是否可以用双过程认知模型再现跨语言的观测到的Zipf结构?

主要发现

  • 所有50种语言均呈现三段式的Zipf模式。
  • 较低段相对于理论Zipf期望持续向下偏离。
  • 向下偏离似乎是自然语言词频的一个基本且普遍的特征。
  • 基于双过程的计算机仿真产生与之相同结构模式的Zipf分布。
  • 结果表明如双加工等认知机制是语言中Zipf定律的基础。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。