QUICK REVIEW

[论文解读] Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Jieyu Zhao, Tianlu Wang|arXiv (Cornell University)|Jul 29, 2017

Multimodal Machine Learning Applications参考文献 23被引用 123

一句话总结

本文展示了结构化预测模型在视觉任务中放大性别偏见，并提出 RBA，一种使用拉格朗日松弛的语料级约束校准方法，以在尽量减小对任务性能影响的同时减少偏见放大。

ABSTRACT

Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.

研究动机与目标

量化可视识别数据集和模型中的性别偏见及其放大。
证明在带偏见语料训练时，结构化预测器会放大现有偏见。
引入语料级校准（RBA），使预测受训练分布统计约束。
显示 RBA 在 vSRL 和 MLC 上减少偏见放大，而对性能几乎无损。
提供一个可重复使用的分析与缓解结构化预测偏见的框架。

提出的方法

将偏见分数 b(o,g) 定义为在结果 o 下包含人口统计学变量 g 的发生比例。
在预测器下，通过比较训练集偏置 b*(o,g) 与开发/测试集偏置 ɪc(o,g) ʟ 来计算偏见放大。
提出语料级约束，强制执行训练数据中观察到的人口统计分布（如每个动词的性别比例）。
应用拉格朗日松弛在语料级约束下联合优化测试样例预测，迭代更新乘子 ɪlambdaɪ 。
在不对底层推理算法进行大规模更改的情况下，将 RBA 与现有基于 CRF 的 vSRL 与类似 CRF 的 MLC 模型集成。

实验结果

研究问题

RQ1可视识别数据集中是否存在显著的性别偏见，且在其上训练的模型是否会放大这种偏见？
RQ2是否可以通过对训练数据进行标定的语料级约束，在不损害预测性能的情况下缓解偏见放大？
RQ3在不同的结构化预测任务（vSRL 和 MLC）中，基于拉格朗日松弛的校准（RBA）有多有效？
RQ4应用 RBA 时，偏见降低与任务准确性之间的权衡是多少？

主要发现

问题	设置	违规	偏见放大	性能（％）
vSRL (imSitu)	Development Set (CRF)	154	0.050	24.07
vSRL (imSitu)	Development Set (CRF+RBA)	107	0.024	23.97
vSRL (imSitu)	Test Set (CRF)	149	0.042	24.14
vSRL (imSitu)	Test Set (CRF+RBA)	102	0.025	24.01
MLC (MS-COCO)	Development Set (CRF)	40	0.032	45.27
MLC (MS-COCO)	Development Set (CRF+RBA)	24	0.022	45.19
MLC (MS-COCO)	Test Set (CRF)	38	0.040	45.40
MLC (MS-COCO)	Test Set (CRF+RBA)	16	0.021	45.38

在动词和宾语上，imSitu 的 vSRL 和 MS-COCO 的 MLC 数据集都表现出对男性的显著性别偏见。
在带偏见数据上训练会放大预测中的偏见（例如，vSRL 开发集的平均偏见放大为 0.050，MLC 开发集为 0.036）。
RBA 在测试集上将平均偏见放大各自降低了 40.5%（vSRL）和 47.5%（MLC）。
RBA 在训练与开发/测试集合偏置分布之间的距离上显著缩小（例如 vSRL 中超过 39%）。
RBA 在显著降低偏见的同时，底层识别性能几乎无损（vSRL 的 top-1 精度，MLC 的 top-1 mAP）。
在基于表格的结果上，RBA 减少了违规率，并在两类任务中的性能保持或略有下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。