QUICK REVIEW

[论文解读] Standard Language Ideology in AI-Generated Language

Genevieve Smith, Eve Fleisig|arXiv (Cornell University)|Jun 13, 2024

Natural Language Processing Techniques被引用 9

一句话总结

本论文分析 AI 生成的语言如何强化标准语言意识，尤其是美式英语标准，并提供一个对开问题的分类法，及对被边缘语言社区的影响。

ABSTRACT

Standard language ideology is reflected and reinforced in language generated by large language models (LLMs). We present a faceted taxonomy of open problems that illustrate how standard language ideology manifests in AI-generated language, alongside implications for minoritized language communities and society more broadly. We introduce the concept of standard AI-generated language ideology, a process through which LLMs position "standard" languages--particularly Standard American English (SAE)--as the linguistic default, reinforcing the perception that SAE is the most "appropriate" language. We then discuss ongoing tensions around what constitutes desirable system behavior, as well as advantages and drawbacks of generative AI tools attempting, or refusing, to imitate different English language varieties. Rather than prescribing narrow technical fixes, we offer three recommendations for researchers, practitioners, and funders that focus on shifting structural conditions and supporting more emancipatory outcomes for diverse language communities.

研究动机与目标

定义标准语言意识及其在 AI 生成语言中的存在。
解释大型语言模型如何强化语言变体的等级观。
提出被边缘化语言社区的开问题与社会伤害的分类法。
讨论在可取的模型行为和潜在解放性数字未来方面的张力。

提出的方法

回顾关于标准语言意识和 AI 生成语言的文献。
分析训练数据和模型人口统计如何促成 SAE 默认输出。
综合研究显示不同英语变体的性能和偏差差异。
提出对被边缘化语言社区的伤害与伦理考量的分类法。
讨论对设计过程和参与式方法在 AI 开发中的影响。

实验结果

研究问题

RQ1AI 生成语言中有哪些机制将标准美式英语作为默认变体？
RQ2与 AI 语言工具互动时，被边缘化英语变体的使用者会遇到哪些伤害？
RQ3在多语言环境中，我们应如何界定可取的或解放性的模型行为？
RQ4哪些开发实践可以减轻标准语言意识并促进包容性 AI 的未来？

主要发现

AI 生成的语言往往默认使用标准变体，尤其是 SAE。
相对于标准变体，输出对被边缘化变体的质量较低或解读错误。
生成被边缘化变体的可能会导致在 AI 输出中的刻板印象、挪用或操控。
若非标准变体被抑制，可能导致被边缘化语言的消失。
以参与式、以社区为中心的模型设计方法可以促进解放性数字未来。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。