QUICK REVIEW

[论文解读] Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them

Hila Gonen, Yoav Goldberg|arXiv (Cornell University)|Mar 9, 2019

Topic Modeling参考文献 8被引用 232

一句话总结

该论文表明流行的去偏方法会降低词嵌入中的性别方向，但并未消除潜在的性别偏见，偏见仍体现在邻域结构和相对词相似性中。

ABSTRACT

Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society. This phenomenon is pervasive and consistent across different word embedding models, causing serious concern. Several recent works tackle this problem, and propose methods for significantly reducing this gender bias in word embeddings, demonstrating convincing results. However, we argue that this removal is superficial. While the bias is indeed substantially reduced according to the provided bias definition, the actual effect is mostly hiding the bias, not removing it. The gender bias information is still reflected in the distances between "gender-neutralized" words in the debiased embeddings, and can be recovered from them. We present a series of experiments to support this claim, for two debiasing methods. We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.

研究动机与目标

动机并量化仅针对词嵌入中性别投影的去偏方法的不充分性。
证明残余偏见与嵌入的邻近结构和全局几何相关。
提供证据表明当前的去偏方法并未产生真正的性别中性表示。

提出的方法

将硬去偏（Bolukbasi 等人，2016b）和 GN-GloVe（Zhao 等人，2018）嵌入与其有偏版本进行比较。
通过在性别方向（he–she）上的投影来量化词语偏见。
使用聚类、邻域分析和基于 WEAT 的关联来评估残余偏见。
评估从有偏与去偏词集合中训练的分类器在推广性别上的效果。

实验结果

研究问题

RQ1去偏是否按性别方向定义的性别投影来减少词语的性别投影？
RQ2在去偏后，残余偏见是否仍然体现在词的邻域和语义关联中？
RQ3是否可以通过邻域分析或分类器从去偏嵌入中恢复隐含的性别信息？

主要发现

Hard-Debiased 单词在性别上聚类的准确率为92.5%，而有偏版本为99.9%。
GN-GloVe 显示聚类准确率为85.6%，而有偏版本为100%。
残余偏见在最近邻结构中显现：去偏后单词仍然接近带有社会偏见的术语。
在去偏后，原始偏见与基于邻居的偏见的相关性仍显著（Pearson r = 0.686；Hard-Debiased；r = 0.736；GN-GloVe）。
职业相关偏见在去偏后显示原始偏见与男性邻居数量之间的强相关性（r = 0.606；Hard-Debiased；r = 0.792；GN-GloVe）。
来自 Caliskan 等人（2017）的关联测试在去偏后仍显著（p 值：Hard-Debiased: 0, 0.00016, 0.0467；GN-GloVe: 7.7e-5, 0.00031, 0.0064）。
从有偏词预测性别的分类器在去偏后表现不佳（Hard-Debiased 88.88% 对 98.25% 非去偏；GN-GloVe 96.53% 对 98.65%）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。