QUICK REVIEW

[论文解读] Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing

Sanghamitra Dutta, Dennis Wei|arXiv (Cornell University)|Oct 17, 2019

Ethics and Social Impacts of AI被引用 48

一句话总结

本文通过不匹配的假设检验和切诺夫信息重新定义了公平性-准确性之间的权衡，表明在理想分布下不存在固有的权衡，并提供在实践中缓解权衡的条件。

ABSTRACT

A trade-off between accuracy and fairness is almost taken as a given in the existing literature on fairness in machine learning. Yet, it is not preordained that accuracy should decrease with increased fairness. Novel to this work, we examine fair classification through the lens of mismatched hypothesis testing: trying to find a classifier that distinguishes between two ideal distributions when given two mismatched distributions that are biased. Using Chernoff information, a tool in information theory, we theoretically demonstrate that, contrary to popular belief, there always exist ideal distributions such that optimal fairness and accuracy (with respect to the ideal distributions) are achieved simultaneously: there is no trade-off. Moreover, the same classifier yields the lack of a trade-off with respect to ideal distributions while yielding a trade-off when accuracy is measured with respect to the given (possibly biased) dataset. To complement our main result, we formulate an optimization to find ideal distributions and derive fundamental limits to explain why a trade-off exists on the given biased dataset. We also derive conditions under which active data collection can alleviate the fairness-accuracy trade-off in the real world. Our results lead us to contend that it is problematic to measure accuracy with respect to data that reflects bias, and instead, we should be considering accuracy with respect to ideal, unbiased data.

研究动机与目标

激发公平性-准确性的问题并挑战在真实数据中假定的权衡。
引入通过切诺夫信息的可分离性，以量化各组的准确性和公平性。
展示偏向映射到观测数据可能造成表观权衡。
提出在公平性和准确性一致时的理想分布并提供构建方法。
推导在何种数据收集条件下主动获取数据可以减少或消除权衡。

提出的方法

在一个构造空间和带有受保护属性 Z 的偏观测空间下建模二分类。
使用似然比探测器和切诺夫指数来量化每个组的错误概率。
将可分离性定义为未受特权组与受特权组在 P0/P1 与 Q0/Q1 之间的切诺夫信息。

实验结果

研究问题

RQ1现实世界中是否存在来自构造空间与观测空间之间偏置映射的准确性-公平性权衡？
RQ2是否存在在公平性和准确性同时达到最大化的理想分布？
RQ3在何种数据收集条件下，增加特征可以提高可分离性并减少权衡？
RQ4如何构建在理想数据上保持公平性同时提高准确性的理想分布？

主要发现

将切诺夫信息用作可分离性度量，以量化每个组的准确性-公平性权衡。
若 C(P0,P1) < C(Q0,Q1)，在观测数据上贝叶斯最优检测器存在不公平，且任何公平性调整至少会使一个组的准确性下降（定理1）。
存在未受特权组的理想分布，使得贝叶斯最优检测器在给定数据上是公平的，在理想数据上也是最优的（定理2）。
一种优化框架可以得到理想分布，在实现公平并在理想数据上使可分离性与特权组相匹配的同时，尽量减小与观测数据的散度（定理2；优化问题（4））。
主动数据收集可以通过提高可分离性来缓解权衡（定理3）。
该工作主张应以理想、无偏的数据来评估准确性，而非有偏的观测数据。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。