QUICK REVIEW

[論文レビュー] Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing

Sanghamitra Dutta, Dennis Wei|arXiv (Cornell University)|Oct 17, 2019

Ethics and Social Impacts of AI被引用数 48

ひとこと要約

本論文はフェアネスと精度のトレードオフを、ミスマッチな仮説検定とChernoff情報を用いて再定義し、理想的な分布の下では本質的なトレードオフがないことを示し、実務上のトレードオフを緩和するための基準を提示する。

ABSTRACT

A trade-off between accuracy and fairness is almost taken as a given in the existing literature on fairness in machine learning. Yet, it is not preordained that accuracy should decrease with increased fairness. Novel to this work, we examine fair classification through the lens of mismatched hypothesis testing: trying to find a classifier that distinguishes between two ideal distributions when given two mismatched distributions that are biased. Using Chernoff information, a tool in information theory, we theoretically demonstrate that, contrary to popular belief, there always exist ideal distributions such that optimal fairness and accuracy (with respect to the ideal distributions) are achieved simultaneously: there is no trade-off. Moreover, the same classifier yields the lack of a trade-off with respect to ideal distributions while yielding a trade-off when accuracy is measured with respect to the given (possibly biased) dataset. To complement our main result, we formulate an optimization to find ideal distributions and derive fundamental limits to explain why a trade-off exists on the given biased dataset. We also derive conditions under which active data collection can alleviate the fairness-accuracy trade-off in the real world. Our results lead us to contend that it is problematic to measure accuracy with respect to data that reflects bias, and instead, we should be considering accuracy with respect to ideal, unbiased data.

研究の動機と目的

フェアネスと精度の問題意識を喚起し、実データにおける仮定されたトレードオフに挑戦する。
Chernoff情報を用いて、グループ間の精度と公平性を定量化する分離性を導入する。
観測データへの偏りのある写像が、見かけ上のトレードオフを生み出し得ることを示す。
フェアネスと精度が整合する理想的な分布を提案し、その構築方法を提供する。
アクティブなデータ収集がトレードオフを緩和または排除する条件を導出する。

提案手法

保護属性 Z を用いた構成空間と、観測空間の偏りを伴う二値分類をモデル化する。
尤度比検出器とChernoff指数を用いて各グループの誤り確率を定量化する。
分離性を、未優遇グループと優遇グループの P0/P1 と Q0/Q1 の間の Chernoff情報として定義する。

実験結果

リサーチクエスチョン

RQ1構成空間と観測空間の偏った写像から、実世界で精度と公平性のトレードオフが生じるか？
RQ2フェアネスと精度が同時に最大化される理想的な分布は存在しうるか？
RQ3特徴量を増やすデータ収集条件の下で、分離性はどのように改善され、トレードオフはどのように低減されるか？
RQ4理想データ上で精度を向上させつつ公平性を保持する理想的分布をどのように構築するか？

主な発見

Chernoff情報を分離性の指標として用い、各グループについて精度と公平性のトレードオフを定量化する。
C(P0,P1) < C(Q0,Q1) なら、Bayes最適検出器は観測データ上で不公平となり、いかなる公平性調整も少なくとも一方のグループの精度を低下させる（定理1）。
未優遇グループのために、Bayes最適検出器が与えられたデータ上で公正であり、理想データ上で最適となる理想的分布が存在する（定理2）。
最適化フレームワークは、観測データからの発散を最小化しつつ、公平性を達成し、理想データ上で優遇グループの分離性と一致させる理想的分布を生み出す（定理2；最適化（4））。
アクティブなデータ収集は分離性を高めることでトレードオフを緩和できる（定理3）。
本研究は、精度は偏りのある観測データではなく、理想的で偏りのないデータを基準に評価すべきであると主張する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。