QUICK REVIEW

[Paper Review] Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment

Muhammad Bilal Zafar, Isabel Valera|arXiv (Cornell University)|Jan 1, 2016

Ethics and Social Impacts of AI21 references586 citations

TL;DR

This paper introduces 'disparate mistreatment'—a fairness notion based on unequal misclassification rates across sensitive groups—and proposes a convex-concave optimization framework to train classifiers that minimize such disparities. The method effectively reduces disparate mistreatment with minimal accuracy cost, outperforming baselines on synthetic and real-world datasets including COMPAS.

ABSTRACT

Automated data-driven decision making systems are increasingly being used to assist, or even replace humans in many settings. These systems function by learning from historical decisions, often taken by humans. In order to maximize the utility of these systems (or, classifiers), their training involves minimizing the errors (or, misclassifications) over the given historical data. However, it is quite possible that the optimally trained classifier makes decisions for people belonging to different social groups with different misclassification rates (e.g., misclassification rates for females are higher than for males), thereby placing these groups at an unfair disadvantage. To account for and avoid such unfairness, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. We then propose intuitive measures of disparate mistreatment for decision boundary-based classifiers, which can be easily incorporated into their formulation as convex-concave constraints. Experiments on synthetic as well as real world datasets show that our methodology is effective at avoiding disparate mistreatment, often at a small cost in terms of accuracy.

Motivation & Objective

Address fairness in machine learning when ground truth is available, moving beyond disparate treatment and impact.
Define and formalize 'disparate mistreatment' as unequal misclassification rates across sensitive groups.
Develop a practical, scalable method to train classifiers that avoid disparate mistreatment while maintaining high accuracy.
Enable simultaneous mitigation of disparate mistreatment and disparate treatment, even when sensitive attributes are not directly used.

Proposed method

Propose a new fairness notion—disparate mistreatment—defined by unequal false positive, false negative, false discovery, or false omission rates across sensitive groups.
Formulate fairness constraints based on misclassification rate differences as convex-concave optimization problems.
Use Monte Carlo approximation of misclassification covariance to estimate fairness constraints efficiently.
Integrate fairness constraints into linear and nonlinear classifiers (e.g., logistic regression) via disciplined convex-concave programming (DCCP).
Enable joint optimization of accuracy and fairness using convex-concave programming solvers.
Support flexible fairness control by targeting specific misclassification types (e.g., false positives or false negatives) depending on application context.

Experimental results

Research questions

RQ1Can a fairness criterion based on misclassification rates—disparate mistreatment—be effectively formalized and optimized in classification systems?
RQ2How can disparate mistreatment be incorporated into classifier training as a convex-concave constraint without sacrificing accuracy?
RQ3To what extent can this method reduce disparate mistreatment while maintaining or improving model utility compared to existing approaches?
RQ4Does the method remain effective when sensitive attributes are not directly used during inference, as required by anti-discrimination laws?

Key findings

The proposed method effectively reduces disparate mistreatment on false positive and false negative rates, achieving fairness with minimal accuracy loss on synthetic and real-world datasets.
On the COMPAS dataset, the method reduced the disparity in false positive rates (DFPR) to 0.06 and false negative rates (DFNR) to -0.14, outperforming the baseline and Hardt et al. in fairness-accuracy trade-off.
Controlling for false positive rate constraints also reduced disparate mistreatment on false negative rates, and vice versa, indicating a strong interdependence between misclassification types.
The method does not achieve zero disparity (e.g., DFPR or DFNR = 0) in all cases, likely due to small dataset size limiting covariance estimation accuracy.
The method by Hardt et al. achieved zero DFPR and DFNR but at a significant accuracy cost (e.g., 64.5% accuracy on COMPAS), highlighting the trade-off between fairness and utility.
Performance degrades on small datasets due to unreliable covariance estimation, suggesting the method benefits from larger training sets for robust fairness constraints.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.