QUICK REVIEW

[Paper Review] Assessing Social and Intersectional Biases in Contextualized Word Representations

Yi Chern Tan, L. Elisa Celis|arXiv (Cornell University)|Nov 4, 2019

Text Readability and Simplification91 citations

TL;DR

The paper evaluates social and intersectional biases in state-of-the-art contextual word models (e.g., BERT, GPT-2) by extending embedding association tests to contextual word representations and introducing race and intersectional bias tests.

ABSTRACT

Social bias in machine learning has drawn significant attention, with work ranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks. These word representations are conditioned on their context within a sentence, and can also be used to encode the entire sentence. In this paper, we analyze the extent to which state-of-the-art models for contextual word representations, such as BERT and GPT-2, encode biases with respect to gender, race, and intersectional identities. Towards this, we propose assessing bias at the contextual word level. This novel approach captures the contextual effects of bias missing in context-free word embeddings, yet avoids confounding effects that underestimate bias at the sentence encoding level. We demonstrate evidence of bias at the corpus level, find varying evidence of bias in embedding association tests, show in particular that racial bias is strongly encoded in contextual word models, and observe that bias effects for intersectional minorities are exacerbated beyond their constituent minority identities. Further, evaluating bias effects at the contextual word level captures biases that are not captured at the sentence level, confirming the need for our novel approach.

Motivation & Objective

Demonstrate that contextual word representations encode social biases present in training corpora.
Extend bias analysis from sentence encodings to contextual word representations to capture context-specific bias.
Evaluate gender, race, and intersectional identities in state-of-the-art models (BERT, GPT-2) across multiple datasets.
Introduce new embedding association tests targeting race and intersectional identities and compare results with sentence-level tests.

Proposed method

Adapt WEAT/SEAT framework to contextual word representations by using the token-level contextual word embedding instead of pooled sentence encodings.
Compute association statistics via cosine similarities between concept and attribute embeddings and perform permutation significance testing (p-values) as in WEAT/SEAT.
Introduce new tests prefix’d with + for race and intersectional identity biases using name-based concepts and attribute pairs (e.g., pleasant/unpleasant, career/family).
Compare bias signals across multiple models (CBoW/Glove, ELMo, BERT, GPT, GPT-2) and across word-, sentence-, and contextual-word encodings to assess where bias appears.
Aggregate results to report the proportion of significant bias tests and examine how corpus bias propagates to contextual representations.

Experimental results

Research questions

RQ1Do contextual word representations encode gender, race, and intersectional biases beyond what sentence encoders reveal?
RQ2How do biases differ across models (BERT, GPT-2, GPT, ELMo) and across word-, sentence-, and contextual-word encodings?
RQ3Is racial bias encoded more strongly than gender bias in contextual word models?
RQ4Do intersectional identities (e.g., African American female) exhibit stronger bias than their constituent identities when evaluated with contextual word representations?
RQ5Can new race and intersectional tests using contextual word embeddings expose biases not captured by sentence-level tests?

Key findings

Racial bias is strongly encoded in contextual word models, often more so than gender bias.
Contextual word representations reveal bias not always detected by sentence encodings; about 37.6% of significant tests showed bias in both encodings, while 36.6% were detected only with contextual word (c-word) encoding.
BERT (bbc) shows high bias on race and intersectional tests; overall larger models do not necessarily increase detected bias and may show fewer significant associations.
Bias propagates from corpus level to encoding level, with corpus gender skew correlating to higher pro-stereotypical associations in contextual encodings.
Intersectional biases (African American female) are larger than either constituent minority bias, and race effects often dominate gender effects in intersectional tests.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.