[论文解读] Dialect prejudice predicts AI decisions about people's character, employability, and criminality
该论文开发了 Matching Guise Probing 来揭示多语言模型中对非洲裔美式英语的隐性方言偏见,并显示该偏见在未直接指明种族时也会影响雇佣与刑事判断。
Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.
研究动机与目标
- 调查语言模型是否持有由方言特征激活的隐性种族语言刻板印象,而非显性种族。
- 开发并应用一种探测方法(Matched Guise Probing)以在不同模型和设定中检测方言偏见。
- 评估方言偏见如何影响 AI 在就业和刑事司法情境中的决策。
- 评估常见偏见缓解策略(放大/缩放、人工反馈)是否能降低隐性方言偏见。
提出的方法
- 引入 Matched Guise Probing,以在未直接提及种族的情况下比较 AAE 文本与 SAE 文本的预测。
- 分析多种模型(GPT2、RoBERTa、T5、GPT3.5、GPT4)在意义匹配与非意义匹配提示下的表现。
- 通过将与 AAE 相关的形容词排序与 Princeton Trilogy 研究中的人类刻板印象进行对比,来衡量隐性刻板印象。
- 通过将职业与 AAE 与 SAE 说话者进行匹配,评估雇佣性和声望相关性。
- 通过模拟审判并计算 AAE 与 SAE utterances 的定罪率和死刑率,评估刑事性偏见。
- 检视放大效应和人工反馈对显性与隐性刻板印象的影响。
实验结果
研究问题
- RQ1语言模型是否在 AAE 特征触发的情况下、与显性种族线索无关的情况下,表现出隐性方言偏见?
- RQ2隐性刻板印象与显性刻板印象在语言模型中有何差异,它们与历史人类刻板印象是否一致?
- RQ3基于方言的偏见会影响 AI 在就业和刑事司法情景中的判断吗?
- RQ4模型规模化或人工反馈训练是否能缓解隐性方言偏见?
主要发现
- 语言模型中的关于 AAE 的隐性刻板印象与上世纪三十年代的古旧人类刻板印象一致,且比任何经实验证明的现代人类刻板印象更为负面。
- 在多个模型中,关于非洲裔美国人的显性刻板印象偏正,尤其是在通过人类反馈训练的模型中,造成隐性与显性偏见之间的不匹配。
- 在就业任务中,模型将 AAE 语音与较低声望的职业相关联,且更高地与 SAE 相关联,预测 AAE 说话者的职业声望较低。
- 在刑事性任务中,模型对 AAE 说话的定罪率和死刑选择高于 SAE 说话。
- 模型规模化在提高对 AAE 的理解的同时也增加隐性方言偏见并减少显性偏见;人工反馈训练提高显性积极性,但未降低隐性偏见。
- 人工反馈减少显性刻板印象,但对隐性刻板印象影响不大,在某些模型中扩大了隐性-显性差距。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。