QUICK REVIEW

[論文レビュー] Bias Dynamics in BabyLMs: Towards a Compute-Efficient Sandbox for Democratising Pre-Training Debiasing

Filip Trhlík, Andrew Caines|arXiv (Cornell University)|Jan 14, 2026

Artificial Intelligence in Healthcare and Education被引用数 0

ひとこと要約

要約: The paper demonstrates that BabyLMs replicate bias acquisition and debiasing dynamics of standard LMs and can serve as a cost-efficient sandbox for pre-training debiasing, reducing compute from 500 to ~30 GPU-hours.

ABSTRACT

Pre-trained language models (LMs) have, over the last few years, grown substantially in both societal adoption and training costs. This rapid growth in size has constrained progress in understanding and mitigating their biases. Since re-training LMs is prohibitively expensive, most debiasing work has focused on post-hoc or masking-based strategies, which often fail to address the underlying causes of bias. In this work, we seek to democratise pre-model debiasing research by using low-cost proxy models. Specifically, we investigate BabyLMs, compact BERT-like models trained on small and mutable corpora that can approximate bias acquisition and learning dynamics of larger models. We show that BabyLMs display closely aligned patterns of intrinsic bias formation and performance development compared to standard BERT models, despite their drastically reduced size. Furthermore, correlations between BabyLMs and BERT hold across multiple intra-model and post-model debiasing methods. Leveraging these similarities, we conduct pre-model debiasing experiments with BabyLMs, replicating prior findings and presenting new insights regarding the influence of gender imbalance and toxicity on bias formation. Our results demonstrate that BabyLMs can serve as an effective sandbox for large-scale LMs, reducing pre-training costs from over 500 GPU-hours to under 30 GPU-hours. This provides a way to democratise pre-model debiasing research and enables faster, more accessible exploration of methods for building fairer LMs.

研究の動機と目的

Motivate a low-cost sandbox for studying LM bias formation and debiasing.
Show that BabyLMs acquire biases similarly to standard LMs and respond to debiasing methods in comparable ways.
Demonstrate that pre-model debiasing experiments can be run with substantially reduced computational resources.

提案手法

Use BabyLM LTG-BERT variants and compare with standard BERT across bias and performance probes (BLiMP, EWoK, CrowS-Pairs, StereoSet).
Establish a composite bias and composite performance metric by averaging scores from multiple probes.
Analyze correlations between composite performance and bias to validate BabyLMs as proxies for standard LMs (Table 1).
Evaluate correlations of debiasing shifts using post-model and intra-model methods (Sent-Debias, INLP, CDA, CDS, debiasing losses, dropout).
Conduct pre-model debiasing experiments (CDA, toxicity removal, perturbation augmentation) on LTG-Baseline to assess cost and effectiveness.

実験結果

リサーチクエスチョン

RQ1Can BabyLMs replicate the bias acquisition dynamics observed in larger LMs like BERT?
RQ2Do BabyLMs exhibit debiasing behaviours comparable to standard LMs across post-model, intra-model, and pre-model interventions?
RQ3Can BabyLMs serve as a cost-efficient platform to explore pre-model debiasing strategies before committing to large-scale experiments?

主な発見

Model Class	N	r(Composite Performance, Composite Bias)
BabyLM	9	0.833
Standard	16	0.753

BabyLMs show strong positive correlation between composite performance and composite bias, similar to standard LMs (r = 0.833 for BabyLMs vs r = 0.753 for Standard in Table 1).
Post-model debiasing effects on bias are consistent across models; gender-focused INLP reduces bias while race-focused INLP may harm accuracy.
Intra-model debiasing yields similar bias reductions across models; LTG-Baseline aligns most closely with BERT in performance-bias shifts (canonical correlations).
Pre-model experiments with BabyLM (CDA, toxicity removal, perturbation augmentation) reproduce known debiasing effects and enable new ablations at ~30 GPU-hours per run.
Toxicity linked to downstream bias; removing toxic sentences reduces bias more effectively than random corpus reduction.
BabyLMs enable replicating prior results and new insights with substantially lower compute, suggesting they are viable debiasing sandboxes.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。