QUICK REVIEW

[論文レビュー] Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models

Shweta Parihar, Liu Guangliang|arXiv (Cornell University)|Feb 10, 2026

Topic Modeling被引用数 0

ひとこと要約

Context-CDA は大規模言語モデルを用いて文脈豊かな性別反転の反事実を作成し、意味的エントロピーによるフィルタリングを追加することで、言語モデリング性能を損なうことなく、複数のアーキテクチャに跨ってモデルの偏りを軽減する。

ABSTRACT

A challenge in mitigating social bias in fine-tuned language models (LMs) is the potential reduction in language modeling capability, which can harm downstream performance. Counterfactual data augmentation (CDA), a widely used method for fine-tuning, highlights this issue by generating synthetic data that may align poorly with real-world distributions or creating overly simplistic counterfactuals that ignore the social context of altered sensitive attributes (e.g., gender) in the pretraining corpus. To address these limitations, we propose a simple yet effective context-augmented CDA method, Context-CDA, which uses large LMs to enhance the diversity and contextual relevance of the debiasing corpus. By minimizing discrepancies between the debiasing corpus and pretraining data through augmented context, this approach ensures better alignment, enhancing language modeling capability. We then employ uncertainty-based filtering to exclude generated counterfactuals considered low-quality by the target smaller LMs (i.e., LMs to be debiased), further improving the fine-tuning corpus quality. Experimental results on gender bias benchmarks demonstrate that Context-CDA effectively mitigates bias without sacrificing language modeling performance while offering insights into social biases by analyzing distribution shifts in next-token generation probabilities.

研究の動機と目的

LM の性別バイアスを低減しつつ言語モデリング能力を損なわない必要性を動機づける。
分布不一致や文脈感受性の欠如といった vanilla CDA の限界を特定し、文脈認識型の拡張を提案する。
意味的エントロピーフィルタリングを用いてデータ品質を向上させ、デバイアリングを改善する Context-CDA を提案する。
多様なアーキテクチャ（エンコーダ、エンコーダ-デコーダ、デコーダ）にわたるモデル非依存のデバイアリング効果を示す。
デバイアリングが次のトークン分布と下流タスクへの影響をどのように及ぼすかについて洞察を提供する。

提案手法

大規模LMに性別反転の反事実をより豊かな文脈で言い換えさせることでデバイアリングコーパスを拡張する（Context-CDA）。
ファインチューニング前に不確実性が高い/品質が低い反事実を除去するため、意味エントロピーに基づくフィルタリングを適用する。
フィルタリングされた Context-CDA データで対象となる小規模LMをファインチューニングし、表現のデバイアリングを行う。
5モデル（BERT、DistilBERT、T5、GPT-2、Llama-3.2-1B）を横断して内在的バイアスを StereoSet と CrowS-Pairs で評価する。
BiasBios、STS-B、NLI-Bias、QNLI、RTE、SST-2 で外部的バイアスと下流タスクの性能を評価する。

実験結果

リサーチクエスチョン

RQ1Context-CDA はエンコーダ、エンコーダ-デコーダ、デコーダ専用モデル間で vanilla CDA より内在的性別バイアスをより効果的に低減するか？
RQ2意味エントロピーによるフィルタリングは反事実の品質を向上させ、言語モデリングを損なうことなくデバイアリングを更に高めるか？
RQ3Context-CDA は多様なアーキテクチャと下流タスクに対して頑健かつモデル非依存か？
RQ4性別文脈におけるデバイアリングは言語モデリング指標と次トークン分布にどのような影響を与えるか？
RQ5Context-CDA を用いた場合のバイアス緩和と下流タスク性能のトレードオフはどのようになるか？

主な発見

Debiasing Technique	SS (BERT)	LMS (BERT)	ICAT (BERT)	CS (BERT)	SS (GPT-2)	LMS (GPT-2)	ICAT (GPT-2)	CS (GPT-2)
MABEL	47.28	51.65	48.84	52.29	-	-	-	-
INLP	49.16	50.25	49.41	55.73	-	-	-	-
wiki-debiased	-	-	-	-	60.40	91.01	72.08	56.49
SelfDebias	59.34	84.20	68.47	52.29	56.05	87.43	73.18	56.11
SENT-DEBIAS	59.37	84.09	68.33	52.29	60.84	89.07	69.76	56.11
Vanilla	59.95	79.87	63.96	58.01	63.17	77.46	57.04	51.52
CDA	58.55	68.41	56.71	54.96	57.34	69.61	59.39	50.01
Context-CDA	57.75	78.67	66.48	53.43	56.13	75.25	66.01	50.76

Context-CDA は BERT、DistilBERT、GPT-2、Llama-3.2-1B、T5 において vanilla CDA より内在的バイアスをより一貫して低減する。
Context-CDA は CDA および Vanilla と比較して言語モデリング能力（LMS および ICAT）を保持または向上させる。
反事実の意味エントロピーフィルタリングはデバイアリング効果をさらに高め、流暢さを維持する。
アーキテクチャを横断して、Context-CDA は男性語と女性語の用語間でバランスの取れた次トークン分布を示す頑健なモデル非依存のデバイアリングを示す。
下流タスクでの外在的バイアス性能は Context-CDA では CDA より維持または改善される。
収束解析は偏見緩和がエポック75–85程度で安定し、過適合は観察されないことを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。