QUICK REVIEW

[論文レビュー] A Closer Look at AUROC and AUPRC under Class Imbalance

Matthew B. A. McDermott, Haoran Zhang|arXiv (Cornell University)|Jan 11, 2024

Imbalanced Data Classification Techniques被引用数 18

ひとこと要約

この論文は、AUPRCが不均衡な設定で普遍的にAUROCより優れているとは限らないと主張し、指標間の理論的関係を提示し、合成実験と文献調査を通じて潜在的な公平性の偏りを示す。

ABSTRACT

In machine learning (ML), a widespread claim is that the area under the precision-recall curve (AUPRC) is a superior metric for model comparison to the area under the receiver operating characteristic (AUROC) for tasks with class imbalance. This paper refutes this notion on two fronts. First, we theoretically characterize the behavior of AUROC and AUPRC in the presence of model mistakes, establishing clearly that AUPRC is not generally superior in cases of class imbalance. We further show that AUPRC can be a harmful metric as it can unduly favor model improvements in subpopulations with more frequent positive labels, heightening algorithmic disparities. Next, we empirically support our theory using experiments on both semi-synthetic and real-world fairness datasets. Prompted by these insights, we conduct a review of over 1.5 million scientific papers to understand the origin of this invalid claim, finding that it is often made without citation, misattributed to papers that do not argue this point, and aggressively over-generalized from source arguments. Our findings represent a dual contribution: a significant technical advancement in understanding the relationship between AUROC and AUPRC and a stark warning about unchecked assumptions in the ML community.

研究の動機と目的

不均衡な二値分類においてAUPRCがAUROCより優れているという広く信じられている主張に挑む。
AUROCとAUPRCの数学的関係を形式化する。
異なる有病率をもつサブ集団間で、指標の選択が公平性に与える影響を検討する。
AUPRCの潜在的利点を支持する文献を評価し、誤った帰属を特定する。

提案手法

分布p+, p−, pを含むAUROCとAUPRCの理論的関係を証明する。
原子的誤りを定義し、AUROCとAUPRCがどのように異なる修正を優先するかを示す（定理1と定理2）。
定理を検証し、AUROC対AUPRCによる最適化下でのサブ集団ごとの効果を示すための合成実験を行う。
自動分析と手動分析を用いた文献調査を行い、不均衡な設定でAUPRCが優れているという主張の普及度と裏付けを評価する。

Figure 1 : Atomic mistakes occur when neighboring samples, when ordered by model score, are out-of-order with respect to the classification label. AUROC improves by a constant amount no matter which atomic mistake is corrected; AUPRC improves in descending order with model score due to the dependenc

実験結果

リサーチクエスチョン

RQ1不均衡なクラス有病率を伴う二値分類において、AUROCとAUPRCは決定論的に関連するか。
RQ2各指標はスコア領域やサブ集団全体でどのようにモデル改善を優先づけるか。
RQ3AUPRCを最適化することは、異なる有病率を持つサブグループ間の格差をAUROCと比べて生むか。
RQ4不均衡下でAUPRCが優れているという一般的信念は、文献全体の実証的証拠によって支持されているか。

主な発見

AUROCとAUPRCは形式的な表現を介して確率的に関連しており、AUPRCが普遍的に優れているという見解に挑戦する。
AUROCは偽陽性を等しく重視し、スコア領域全体で偏りがないのに対し、AUPRCは偽陽性を発火率の逆数で重みづけし、高スコアの誤りを優先する。
AUPRC下での最適化は高有病率のサブ集団を favor する傾向があり、有病率が異なるグループ間の公平性を損なう可能性がある。
合成実験はAUPRCによる調整がサブ集団間の格差を拡大する可能性を示し、AUROC最適化は指標をより均等に改善する。
徹底的な文献調査は、不均衡な設定でAUPRCが優れているという広く認められた主張がしばしば誤帰属され、根拠の乏しい引用が多いことを明らかにする。

(a) Fixing atomic mistakes to optimize overall AUROC

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。