QUICK REVIEW

[논문 리뷰] Gender Bias in Neural Natural Language Processing

Kaiji Lu, Piotr Mardziel|arXiv (Cornell University)|2018. 07. 31.

Topic Modeling참고 문헌 17인용 수 73

한 줄 요약

논문은 신경망 NLP를 위한 일반적인 편향 벤치마크를 정의하고, 코어퍼런스 및 언어 모델링에서 상당한 성별 편향을 보여주며, 편향을 완화하는 counterfactual data augmentation (CDA)을 도입하여 정확도를 유지하면서 임베딩 편향 제거 방법에서 여러 설정에서 우수함을 보인다.

ABSTRACT

We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark datasets finds significant gender bias in how models view occupations. We then mitigate bias with CDA: a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.

연구 동기 및 목표

Propose a general, causal-testing-based benchmark to quantify gender bias in neural NLP tasks.
Demonstrate gender bias in neural coreference resolution and language modeling using state-of-the-art models.
Evaluate debiasing strategies, including word-embedding debiasing and counterfactual data augmentation (CDA).
Show that CDA reduces bias while preserving predictive accuracy and how it compares to prior debiasing approaches.

제안 방법

Define score-based bias measures via matched intervention pairs to quantify gender bias in coreference and language modeling.
Use occupation-centered templates and gender swaps (g_naive) to construct intervention pairs for bias measurement.
Apply counterfactual data augmentation (CDA) by adding gender-swapped counterfactual instances to training data.
Compare CDA with word-embedding debiasing (WED) and their combinations across neural coreference models and an RNN language model.
Analyze bias growth during training and demonstrate CDA mitigates this growth.
Evaluate on CoNLL-2012 coreference data with Lee et al. (2017) and Clark & Manning (2016b) models, and on WikiText-2 language modeling with a two-layer LSTM.

실험 결과

연구 질문

RQ1Do neural NLP models exhibit gender bias in coreference resolution and language modeling?
RQ2Can counterfactual data augmentation (CDA) reduce bias without sacrificing accuracy, and how does it compare to word embedding debiasing (WED)?
RQ3How does bias evolve during training, and can CDA curb its growth?
RQ4What are the effects of composing CDA with WED on biased downstream tasks?

주요 결과

Neural models show significant gender biases related to occupations in coreference and language modeling.
CDA reduces aggregate occupation bias substantially and maintains (or minimally affects) accuracy across tasks.
Word embedding debiasing alone reduces some bias but often harms downstream accuracy, especially when embeddings are co-trained.
Combinations of CDA with pre-training WED can offer complementary debiasing effects, while certain combinations may overcorrect or harm performance.
Debiasing via CDA is more effective than WED in reducing bias, particularly when embeddings are trained jointly with the model.
During training on original data, bias can increase as the loss decreases, but CDA mitigates this trend.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.