[Paper Review] FAIR: Fair Adversarial Instance Re-weighting
FAIR proposes a novel deep learning framework that integrates adversarial training with instance reweighting to improve fairness in classification models. By learning instance-specific weights through an adversarial process, FAIR achieves better trade-offs between accuracy and fairness than state-of-the-art methods, while providing interpretable fairness insights per instance.
With growing awareness of societal impact of artificial intelligence, fairness has become an important aspect of machine learning algorithms. The issue is that human biases towards certain groups of population, defined by sensitive features like race and gender, are introduced to the training data through data collection and labeling. Two important directions of fairness ensuring research have focused on (i) instance weighting in order to decrease the impact of more biased instances and (ii) adversarial training in order to construct data representations informative of the target variable, but uninformative of the sensitive attributes. In this paper we propose a Fair Adversarial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions. Merging the two paradigms, it inherits desirable properties from both -- interpretability of reweighting and end-to-end trainability of adversarial training. We propose four different variants of the method and, among other things, demonstrate how the method can be cast in a fully probabilistic framework. Additionally, theoretical analysis of FAIR models' properties have been studied extensively. We compare FAIR models to 7 other related and state-of-the-art models and demonstrate that FAIR is able to achieve a better trade-off between accuracy and unfairness. To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.
Motivation & Objective
- Address fairness in machine learning by mitigating bias from sensitive attributes like race and gender.
- Overcome limitations of pre-processing reweighting (lack of task-awareness) and adversarial representation learning (lack of interpretability).
- Develop a unified framework that combines the interpretability of instance reweighting with the end-to-end trainability of adversarial training.
- Enable model-level interpretability by learning instance-specific fairness weights that reflect individual fairness contributions.
- Demonstrate superior performance on fairness and accuracy metrics across diverse real-world datasets.
Proposed method
- Proposes a three-network architecture: a weighting network, a sensitive attribute predictor, and a target label predictor.
- Uses adversarial training to encourage the feature representation to be predictive of the target label but uninformative about the sensitive attribute.
- Introduces four variants: FAIR-scalar (non-probabilistic weights), FAIR-Bernoulli, FAIR-betaSF, and FAIR-betaREP (probabilistic weights using Bernoulli and Beta distributions).
- Employs score function and reparameterization techniques for gradient estimation in probabilistic variants to enable backpropagation.
- Incorporates baseline functions to reduce variance in gradient estimation for score function-based models.
- Casts the method in a fully probabilistic framework, enabling principled uncertainty modeling and expectation estimation.
Experimental results
Research questions
- RQ1Can adversarial training be effectively used to learn instance reweighting functions that enhance fairness without sacrificing predictive performance?
- RQ2How does the hyperparameter α control the trade-off between fairness and model accuracy in the FAIR framework?
- RQ3To what extent can the learned instance weights provide interpretable insights into the fairness of individual predictions?
- RQ4How do probabilistic formulations (Bernoulli and Beta distributions) improve the robustness and training stability of the reweighting mechanism?
- RQ5Can FAIR outperform existing state-of-the-art fairness methods in both fairness metrics and classification accuracy across diverse datasets?
Key findings
- FAIR achieves the best trade-off between fairness and accuracy among 8 compared models on four real-world datasets, including German Credit and Readmission.
- The FAIR-scalar variant successfully identifies 'fair' instances with balanced attributes—such as stable employment, no foreign worker status, and no other debtors—regardless of gender.
- As the hyperparameter α decreases, the model increasingly discards potentially biased but predictive instances, reducing AUC for the sensitive attribute while maintaining target AUC.
- Theoretical analysis confirms that α controls the fairness-accuracy trade-off, with higher α values favoring fairness and lower values favoring predictive performance.
- Experimental results verify that FAIR-scalar correctly labels instances as fair when sensitive attributes like sex do not influence the final prediction, demonstrating interpretability.
- The use of baseline functions in FAIR-Bernoulli and FAIR-betaSF significantly reduces gradient variance, improving training stability and convergence.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.