Skip to main content
QUICK REVIEW

[论文解读] Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Tolga Bolukbasi, Kai-Wei Chang|arXiv (Cornell University)|Jul 21, 2016
Hate Speech and Cyberbullying Detection参考文献 33被引用 1,357
一句话总结

本论文表明词嵌入呈现出与刻板印象强相关的性别偏见,并提出去偏方法,减少偏见同时保留聚类与类比等有用性质。

ABSTRACT

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

研究动机与目标

  • 量化现有词嵌入中的直接性别偏见和间接性别偏见。
  • 识别一个性别子空间,捕捉大部分与性别相关的方差。
  • 发展去偏方法,使对性别中性词汇的偏见降低,同时保留定义性别相关联和有用的关系。
  • 评估去偏是否保留聚类和类比能力,并评估与人类刻板印象的一致性。

提出的方法

  • 从一对性别特定词(例如 she–he、woman–man)计算并对齐一个性别子空间。
  • 将直接偏见定义为性别中性词对性别方向的余弦相似度的函数。
  • 将词向量分解为性别分量与非性别分量,以量化间接偏见。
  • 提出去偏算法,在降低性别中性词的偏见的同时保留有意义的关联。
  • 利用众包评估和诸如聚类与类比求解等标准嵌入任务评估偏见减少效果。

实验结果

研究问题

  • RQ1词嵌入在多大程度上直接和间接地编码性别偏见?
  • RQ2是否可以在不同嵌入中鲁棒地识别出性别子空间,以及它如何用于衡量偏见?
  • RQ3是否有可能通过去偏使嵌入减少性别偏见,同时保持有用的语义结构和类比性能?

主要发现

  • Google News(以及其他来源)的词嵌入在职业和类比中表现出性别刻板印象(例如,与女性/男性角色相关的联想)。
  • 可以识别出一个性别子空间,捕捉到性别相关词差异的大部分方差。
  • 直接和间接性别偏见可以被量化并作为去偏目标,而不破坏词聚类和类比等关键嵌入工具。
  • 去偏方法在显著降低性别偏见的同时保留嵌入的有用属性,使应用更不易放大偏见。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。