QUICK REVIEW

[论文解读] Assessing Social and Intersectional Biases in Contextualized Word Representations

Yi Chern Tan, L. Elisa Celis|arXiv (Cornell University)|Nov 4, 2019

Text Readability and Simplification被引用 68

一句话总结

该论文在情境词表示层面分析偏见（例如 BERT、GPT-2），在情境词层面显示出显著的性别、种族及交叉性偏见，与句子层面的评估不同。它还引入新的测试来衡量交叉性偏见和语料库层面的偏见传播。

ABSTRACT

Social bias in machine learning has drawn significant attention, with work ranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks. These word representations are conditioned on their context within a sentence, and can also be used to encode the entire sentence. In this paper, we analyze the extent to which state-of-the-art models for contextual word representations, such as BERT and GPT-2, encode biases with respect to gender, race, and intersectional identities. Towards this, we propose assessing bias at the contextual word level. This novel approach captures the contextual effects of bias missing in context-free word embeddings, yet avoids confounding effects that underestimate bias at the sentence encoding level. We demonstrate evidence of bias at the corpus level, find varying evidence of bias in embedding association tests, show in particular that racial bias is strongly encoded in contextual word models, and observe that bias effects for intersectional minorities are exacerbated beyond their constituent minority identities. Further, evaluating bias effects at the contextual word level captures biases that are not captured at the sentence level, confirming the need for our novel approach.

研究动机与目标

衡量最先进情境词表示中的社会偏见与交叉性偏见。
将偏见评估扩展到情境词层面，超越句子编码，以避免混淆。
评估来自预训练数据到情境表示的语料库层面偏见传播。
比较不同模型和语料库中的偏见盛行情况，突出与种族相关的偏见。

提出的方法

通过使用标记的情境词嵌入（c-word）而非句子编码，将 WEAT/SEAT 偏见测试适配到情境词表示。
按照 WEAT/SEAT 的做法计算检验统计量 s(X,Y,A,B) 与基于置换的 p 值（方程 1–3），以及效应量 d（方程 4）。
引入新的 +gender/+race/+intersectional 测试，使用名字与职业术语探测偏见（如 +C11, +C12, +C13, +I5）。
统计数据集中代词出现次数以及与刻板性别化职业词的共现，以揭示语料库层面的性别偏见（M/F/they，Table 1）。
评估多种模型（CBoW、ELMo、BERT、GPT、GPT-2），显著性阈值 p = 0.01，并比较 c-word 与句子编码。
讨论偏见如何在不同的情境词表示与句子编码之间表现出不同的特征。

实验结果

研究问题

RQ1最先进的情境词模型是否编码性别、种族和交叉性偏见？
RQ2情境词表示中的偏见与在句子编码层检测到的偏见相比如何？
RQ3语料库层面的性别和种族偏见是否传播到情境词嵌入？
RQ4是否存在交叉身份的偏见，且这些偏见是否大于单一少数群体身份（种族或性别）的偏见？

主要发现

数据集显示性别不平衡，男性代词出现率较高，且与男性相关职业的共现偏向刻板印象。
种族偏见在情境词模型中被强烈编码，往往比性别偏见更明显。
偏见可在情境词层面（c-word）检测到，超出句子编码，许多测试仅在 c-word 编码下显著。
规模更大的模型往往呈现较少的显著偏见关联，而与种族相关的偏见在各模型中依然存在。
交叉性偏见（非裔美籍女性）比单一少数群体身份的偏见更大，且种族效应通常超过性别效应。
跨模型的情境词测试揭示了句子级测试可能错过的偏见，强调需要双重编码评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。