QUICK REVIEW

[论文解读] Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias

Marion Bartl, Malvina Nissim|arXiv (Cornell University)|Oct 27, 2020

Hate Speech and Cyberbullying Detection参考文献 27被引用 50

一句话总结

本文提出一个双语（英语、德语）框架，使用偏见评估语料库(BEC-Pro)和掩码语言模型方法来衡量 BERT 的性别偏见，并评估通过 Counterfactual Data Substitution（CDS）并在 GAP 数据上微调后的缓解效果。它还展示了跨语言的局限性，尤其是对德语。

ABSTRACT

Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically, especially in view of the current emphasis on large-scale, multilingual language models.

研究动机与目标

评估在英语和德语中，BERT 如何在以职业相关情境中编码性别偏见。
开发基于模版的偏见测量语料库（BEC-Pro）并验证基于 MLM 的关联度量。
通过应用 Counterfactual Data Substitution (CDS) 并在平衡语料库（GAP）上进行微调来评估偏见缓解。
研究偏见测量与缓解方法的跨语言迁移，突出德语特有形态学效应。

提出的方法

在英语和德语中使用以职业为中心的模版创建 BEC-Pro，并以真实世界劳动力统计数据为锚点。
通过使用 BERT 的掩码语言模型（P(T|A) 与 P(T)）计算条件于属性的对数比目标来衡量关联偏见。
对 GAP 语料应用 CDS，并对英语 BERT 进行微调（三个时代，AdamW）以降低偏见，然后重新评估。
将测量模版翻译并适应到德语，以测试跨语言迁移并分析语法性别对偏见测量的影响。

实验结果

研究问题

RQ1是否可以使用基于 MLM 的关联分数对英语和德语中的 BERT 性别偏见进行可靠测量？
RQ2通过 Counterfactual Data Substitution 及随后的微调是否可以减轻英语 BERT 的偏见，程度如何？
RQ3为英语开发的测量方法是否可转移到形态丰富且有性别标记的德语？
RQ4测得的偏见与各职业群体的真实世界劳动力统计数据之间有何关系？

主要发现

对于英语，CDS 加微调在女性目标与女性典型职业中的关联度有所降低，但在非典型场景下可能增加关联。
英语 BERT 的偏见在微调前与现实世界劳动力统计数据保持一致，偏见缓解效果对女性术语比对男性术语更明显。
德语结果表明，由英语为基础的测量方法因德语的语法性别标记而转移效果差，一直存在对女性目标的更高关联。
在德语中，由于性别标记，男性/女性职业术语之间的关联存在差异，表明存在英语方法未能捕捉到的语言特异性偏见。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。