[论文解读] Towards a more efficient bias detection in financial language models
该论文使用一个大型真实金融句子数据集分析五个金融语言模型的偏见,展示了共现的偏见模式并提出跨模型引导偏见检测以降低成本。
Bias in financial language models constitutes a major obstacle to their adoption in real-world applications. Detecting such bias is challenging, as it requires identifying inputs whose predictions change when varying properties unrelated to the decision, such as demographic attributes. Existing approaches typically rely on exhaustive mutation and pairwise prediction analysis over large corpora, which is effective but computationally expensive-particularly for large language models and can become impractical in continuous retraining and releasing processes. Aiming at reducing this cost, we conduct a large-scale study of bias in five financial language models, examining similarities in their bias tendencies across protected attributes and exploring cross-model-guided bias detection to identify bias-revealing inputs earlier. Our study uses approximately 17k real financial news sentences, mutated to construct over 125k original-mutant pairs. Results show that all models exhibit bias under both atomic (0.58\%-6.05\%) and intersectional (0.75\%-5.97\%) settings. Moreover, we observe consistent patterns in bias-revealing inputs across models, enabling substantial reuse and cost reduction in bias detection. For example, up to 73\% of FinMA's biased behaviours can be uncovered using only 20\% of the input pairs when guided by properties derived from DistilRoBERTa outputs.
研究动机与目标
- 使用真实金融句子进行变异,针对受保护属性进行原子级和交叉属性变更,以评估五个金融语言模型(两种生成型、三种编码器)的偏见。
- 探讨偏见揭示输入是否在模型之间共享,是否可以重复使用。
- 评估跨模型引导偏见检测在降低计算和推理成本方面的作用。
- 量化性别、种族与身体属性的原子级和交叉性偏见。
- 确定在金融自然语言处理部署中加速偏见审计的实际策略。
提出的方法
- 使用 HInter 对真实金融句子进行原子级和交叉属性变更的变异。
- 对五个模型(FinMA、FinGPT、FinBERT、DeBERTa-v3、DistilRoBERTa)在原始输入和变异输入上的情感预测进行跑分。
- 当原始输入和变异输入产生不同的情感标签时检测偏见。
- 计算偏见检测比率,并分析跨模型的偏见揭示输入的重叠情况。
- 使用 Jensen–Shannon 距离和余弦相似度衡量预测的位移,以捕捉非翻转偏见。
- 通过优先考虑其他模型预测来进行跨模型引导的偏见检测,并与随机输入排序进行比较以评估效果。
实验结果
研究问题
- RQ1金融语言模型是否在受保护属性(性别、种族、身体)上表现出原子级和交叉性偏见?
- RQ2偏见揭示输入是否在模型之间共享,从而可以重复用于偏见检测?
- RQ3是否可以利用跨模型引导(使用轻量级模型)在不牺牲可靠性的前提下加速对大型模型的偏见检测?
主要发现
| Model | Atomic (Body) | Inter. (Body) | Atomic (Gender) | Inter. (Gender) | Atomic (Race) | Inter. (Race) | Total (Atomic) | Total (Inter.) | Total Hidden (Inter.) |
|---|---|---|---|---|---|---|---|---|---|
| FinMA | 9.23% | 7.48% | 2.77% | 2.25% | 3.25% | 3.29% | 3.99% | 3.23% | 4.05% |
| FinGPT | 5.39% | 2.77% | 6.10% | 6.55% | 6.13% | 6.07% | 6.05% | 5.97% | 31.29% |
| FinBERT | 1.89% | 1.88% | 0.69% | 0.88% | 0.25% | 0.41% | 0.58% | 0.75% | 30.34% |
| DeBERTa-v3 | 1.69% | 1.67% | 0.70% | 0.89% | 0.30% | 0.46% | 0.60% | 0.75% | 29.95% |
| DistilRoBERTa | 1.69% | 1.67% | 0.70% | 0.89% | 0.30% | 0.46% | 0.60% | 0.75% | 29.95% |
- 所有五个模型在原子级和交叉性设置下均显示偏见,且程度不同。
- 轻量级模型在原子级和交叉性设置下总体偏见低于大型模型。
- 分类模型之间的偏见揭示输入重叠度较高(超过94%共享),但在分类模型与生成模型之间较低。
- 通过轻量模型的预测进行跨模型偏见检测可以在输入有限的情况下揭示大型模型中大量偏见(例如,使用20%的输入即可发现FinMA偏见的73%)。
- 偏见揭示输入往往产生比非偏见揭示输入更大的预测位移(更高的 JSD),支持跨模型优先策略。
- 结果支持在同一类模型族内重复使用偏见输入以降低审计成本。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。