QUICK REVIEW

[论文解读] From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models

Abdulmuizz Khalak, Abderrahmane Issam|arXiv (Cornell University)|Feb 10, 2026

Natural Language Processing Techniques被引用 0

一句话总结

本论文研究通过探针和中心核对齐的表现相似性分析（RSA）在现代标准阿拉伯语（MSA）与方言之间进行跨语言迁移，结果显示迁移是可行的但在不同方言之间不均衡，并受地理接近性和预训练数据规模的影响。

ABSTRACT

Arabic Language Models (LMs) are pretrained predominately on Modern Standard Arabic (MSA) and are expected to transfer to its dialects. While MSA as the standard written variety is commonly used in formal settings, people speak and write online in various dialects that are spread across the Arab region. This poses limitations for Arabic LMs, since its dialects vary in their similarity to MSA. In this work we study cross-lingual transfer of Arabic models using probing on 3 Natural Language Processing (NLP) Tasks, and representational similarity. Our results indicate that transfer is possible but disproportionate across dialects, which we find to be partially explained by their geographic proximity. Furthermore, we find evidence for negative interference in models trained to support all Arabic dialects. This questions their degree of similarity, and raises concerns for cross-lingual transfer in Arabic models.

研究动机与目标

出于双言体制和方言多样性，推动对阿拉伯语方言与MSA之间跨语言迁移的研究。
在三个NLP任务（SA、NER、POS）上使用探针评估迁移并通过CKA衡量表征相似性。
评估MSA为中心、混合方言、以及方言特定模型在各方言变体上的表现。
研究迁移差异的驱动因素，包括地理接近性和预训练数据量。

提出的方法

将冻结层级嵌入的探针与线性分类器相结合，以评估编码的语言特征。
应用表征相似性分析（CKA）量化MSA与方言模型之间的逐层表征相似性。
使用并行MADAR数据在MSA与方言编码器的多种情景下计算CKA。
引入地理接近性代理（以也门作为MSA锚点）以将迁移与方言连续体相关联。
在跨方言和MSA数据集上评估POS、NER、和SA任务的模型。

Figure 1: Architecture of the probing classifier for the example sentence “The boy is eating the apple now.” Sentence representations pass through N layers, and each layer is probed using the classifier in Eq. 1 .

实验结果

研究问题

RQ1MSA训练的表示在POS、NER、SA任务中向方言阿拉伯语的迁移性有多大？
RQ2在本族裔方言上，方言特定模型是否优于基于MSA的一般模型，在何种数据条件下？
RQ3MSA与方言模型之间的表征相似性（CKA）如何与迁移效果相关？
RQ4地理接近性是否能预测迁移能力和表征相似性？

主要发现

MSA为中心的模型通常对方言迁移良好，在某些任务上甚至优于方言特定模型。
当获得大量方言特定预训练数据时，方言特定模型往往优于通用模型。
迁移与表征相似性呈现与地理接近性对齐的方言连续体，但数据规模会调节这一效应。
多方言模型中可能出现负干扰，尤其对于高资源方言，表明广义多方言预训练的局限性。
CKA相似性并不保证功能迁移，凸显结构相似性与任务表现之间的差距。

Figure 2: Architecture of CKA for representation similarity. MADAR parallel sentences are encoded by MSA and DA encoders through N layers, and the resulting representations are compared using linear CKA (Eq. 2 ).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。