QUICK REVIEW

[论文解读] Gender Bias in Neural Natural Language Processing

Kaiji Lu, Piotr Mardziel|arXiv (Cornell University)|Jul 31, 2018

Topic Modeling参考文献 17被引用 73

一句话总结

论文为神经NLP定义了一个通用的偏差基准，显示核心指代和语言模型中的显著性别偏见，并引入对照事实数据增强（CDA）以减轻偏见，同时在保持准确性的前提下，在若干设置中优于嵌入去偏方法。

ABSTRACT

We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark datasets finds significant gender bias in how models view occupations. We then mitigate bias with CDA: a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.

研究动机与目标

提出一个一般的、基于因果测试的基准，用于量化神经NLP任务中的性别偏见。
使用state-of-the-art 模型展示神经核心指代解析和语言建模中的性别偏见。
评估去偏策略，包括词嵌入去偏和对照事实数据增强（CDA）。
展示CDA在减小偏见的同时保持预测准确性，并在若干设定中优于先前的去偏方法。

提出的方法

通过匹配干预对定义基于分数的偏差度量，以量化核心指代和语言建模中的性别偏见。
使用以职业为中心的模板和性别替换（g_naive）来构建偏见测量的干预对。
通过向训练数据中添加性别替换的对照事实实例来应用对照事实数据增强（CDA）。
在神经核心指代模型和一个RNN语言模型中，将CDA与词嵌入去偏（WED）及其组合进行比较。
分析训练过程中的偏差增长，并证明CDA能减缓这一增长。
在CoNLL-2012核心指代数据上使用Lee et al. (2017)和Clark & Manning (2016b) 模型，在WikiText-2语言建模上使用两层LSTM进行评估。

实验结果

研究问题

RQ1神经NLP模型在核心指代解析和语言建模中是否显示出性别偏见？
RQ2对照事实数据增强（CDA）是否能在不牺牲准确性的前提下减少偏见，它与词嵌入去偏（WED）相比如何？
RQ3偏见在训练过程中的演变如何，CDA能否抑制其增长？
RQ4将CDA与WED联合对偏向性下游任务有何影响？

主要发现

神经模型在核心指代和语言建模中显示出与职业相关的显著性别偏见。
CDA显著减少总体职业偏见，并在各任务中保持（或最小地影响）准确性。
单独的词嵌入去偏在减少某些偏见方面有效，但通常会损害下游准确性，尤其是在嵌入与模型共同训练时。
将CDA与预训练WED的组合可以提供互补的去偏效果，而某些组合可能过度校正或损害性能。
通过CDA去偏在减少偏见方面比WED更有效，尤其当嵌入与模型联合训练时。
在原始数据上训练时，偏见可能随着损失下降而增加，但CDA能缓解这一趋势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。