QUICK REVIEW

[论文解读] FAIR: Fair Adversarial Instance Re-weighting

Andrija Petrović, Mladen Nikolić|arXiv (Cornell University)|Nov 15, 2020

Adversarial Robustness in Machine Learning被引用 5

一句话总结

FAIR 提出了一种新颖的深度学习框架，通过将对抗性训练与实例重加权相结合，提升分类模型的公平性。通过对抗性过程学习实例特定的权重，FAIR 在准确率与公平性之间实现了优于最先进方法的权衡，同时为每个实例提供了可解释的公平性洞察。

ABSTRACT

With growing awareness of societal impact of artificial intelligence, fairness has become an important aspect of machine learning algorithms. The issue is that human biases towards certain groups of population, defined by sensitive features like race and gender, are introduced to the training data through data collection and labeling. Two important directions of fairness ensuring research have focused on (i) instance weighting in order to decrease the impact of more biased instances and (ii) adversarial training in order to construct data representations informative of the target variable, but uninformative of the sensitive attributes. In this paper we propose a Fair Adversarial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions. Merging the two paradigms, it inherits desirable properties from both -- interpretability of reweighting and end-to-end trainability of adversarial training. We propose four different variants of the method and, among other things, demonstrate how the method can be cast in a fully probabilistic framework. Additionally, theoretical analysis of FAIR models' properties have been studied extensively. We compare FAIR models to 7 other related and state-of-the-art models and demonstrate that FAIR is able to achieve a better trade-off between accuracy and unfairness. To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.

研究动机与目标

通过减轻种族、性别等敏感属性带来的偏见，解决机器学习中的公平性问题。
克服预处理重加权（缺乏任务感知性）和对抗性表征学习（缺乏可解释性）的局限性。
构建一个统一框架，结合实例重加权的可解释性与对抗性训练的端到端可训练性。
通过学习反映个体公平性贡献的实例特定公平性权重，实现模型层面的可解释性。
在多样化的现实世界数据集上，证明 FAIR 在公平性与准确率指标上的优越性能。

提出的方法

提出三网络架构：一个权重网络、一个敏感属性预测器和一个目标标签预测器。
使用对抗性训练，促使特征表示对目标标签具有预测能力，但对敏感属性不具信息量。
提出四种变体：FAIR-scalar（非概率权重）、FAIR-Bernoulli、FAIR-betaSF 和 FAIR-betaREP（使用伯努利和贝塔分布的概率权重）。
在概率变体中采用得分函数和重参数化技术进行梯度估计，以支持反向传播。
引入基线函数以降低基于得分函数模型的梯度估计方差。
将方法置于完全概率框架中，支持合理的不确定性建模与期望估计。

实验结果

研究问题

RQ1对抗性训练能否有效用于学习实例重加权函数，在不牺牲预测性能的前提下提升公平性？
RQ2在 FAIR 框架中，超参数 α 如何控制公平性与模型准确率之间的权衡？
RQ3所学习的实例权重在多大程度上能为个体预测的公平性提供可解释的洞察？
RQ4伯努利和贝塔分布的概率公式在多大程度上提升了重加权机制的鲁棒性与训练稳定性？
RQ5FAIR 是否能在多样化的数据集上，同时优于现有最先进公平性方法在公平性指标与分类准确率上的表现？

主要发现

在包括德国信贷和再入院在内的四个真实世界数据集上，FAIR 在与 8 种对比模型的比较中，实现了最佳的公平性与准确率权衡。
FAIR-scalar 变体能够成功识别出具有平衡属性的‘公平’实例——如稳定就业、非外国工人身份、无其他债务人——且不受性别影响。
随着超参数 α 减小，模型逐渐舍弃可能具有偏见但具有预测能力的实例，降低敏感属性的 AUC，同时保持目标 AUC 稳定。
理论分析证实，α 控制公平性与准确率的权衡，较高的 α 值更有利于公平性，较低的 α 值更有利于预测性能。
实验结果验证，当敏感属性（如性别）不影响最终预测时，FAIR-scalar 能够正确将实例标记为公平，体现了其可解释性。
在 FAIR-Bernoulli 和 FAIR-betaSF 中使用基线函数显著降低了梯度方差，提升了训练稳定性和收敛性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。