QUICK REVIEW

[論文レビュー] Measuring and Reducing Gendered Correlations in Pre-trained Models

Kellie Webster, Xuezhi Wang|arXiv (Cornell University)|Oct 12, 2020

Ethics and Social Impacts of AI参考文献 47被引用数 106

ひとこと要約

この論文は、事前学習モデルにおける性別に基づく相関を検出する指標を提案し、精度が似ているモデルでも相関レベルが大きく異なることを示し、ドロップアウトと反事実データ拡張などの緩和手法が相関を減らしつつ精度を維持し、ファインチューニング後も効果が持続することを実証する。

ABSTRACT

Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations.

研究の動機と目的

Define a framework and metrics to detect gendered correlations in pre-trained models and downstream tasks.
Demonstrate that similar-accuracy models can have varying levels of gendered correlations.
Evaluate mitigation strategies (dropout regularization and counterfactual data augmentation) and their trade-offs.
Show that mitigations before fine-tuning carry through to downstream tasks and improve robustness to re-learning correlations.

提案手法

Propose a multi-metric evaluation framework including intrinsic (DisCo) and extrinsic (Coref, STS-B, Bias-in-Bios) measures for gendered correlations.
Develop intrinsic DisCo analysis combining template- and generation-based text to discover correlations.
Use standard downstream tasks (coreference resolution, STS-B-style similarity, Bias-in-Bios) to assess correlations in practice after fine-tuning.
Experiment with model variants (BERT, ALBERT; different sizes) to compare correlation levels independent of accuracy.
Apply two general mitigation techniques: (i) dropout regularization during pre-training, (ii) counterfactual data augmentation (CDA) during pre-training, including 2-sided augmentation.

実験結果

リサーチクエスチョン

RQ1Do pre-trained models with similar accuracy encode gendered correlations at different rates?
RQ2Can general-purpose mitigations reduce gendered correlations without sacrificing downstream task performance?
RQ3Do mitigations at pre-training carry through to fine-tuning and degrade re-learning of correlations?
RQ4What are the trade-offs between dropout and counterfactual data augmentation for reducing correlations?

主な発見

Models with similar accuracy can show large differences in gendered correlation metrics (e.g., coreference, STS-B, Bias-in-Bios).
Dropout regularization reduces gendered correlations across multiple metrics without substantial accuracy loss in many settings; however, very high dropout can hurt some tasks such as coreference.
Counterfactual data augmentation (CDA) effectively reduces correlations, often with less impact on overall accuracy, and can generalize beyond the augmented terms.
Mitigations pre-training confer resilience to re-learning correlations during fine-tuning, and partly freezing mitigated models preserves lower correlations while maintaining accuracy under certain conditions.
BERT and ALBERT show different baseline levels of correlation, and no single architecture/size universally minimizes correlations; precise, multi-metric evaluation is recommended.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。