QUICK REVIEW

[論文レビュー] Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

Schrasing Tong, Minseok Jung|arXiv (Cornell University)|Mar 6, 2026

Ethics and Social Impacts of AI被引用数 0

ひとこと要約

この論文は、インフラ・マージナリティにおける公正性判断が、平等性だけでなく、グループ間の分布差とデータ入手可能性に依存することを示すユーザ調査を報告しており、平等性のみを重視する公正性指標に挑戦します。

ABSTRACT

Differences in data distributions between demographic groups, known as the problem of infra-marginality, complicate how people evaluate fairness in machine learning models. We present a user study with 85 participants in a hypothetical medical decision-making scenario to examine two treatments: group-specific model performance and training data availability. Our results show that participants did not equate fairness with simple statistical parity. When group-specific performances were equal or unavailable, participants preferred models that produced equal outcomes; when performances differed, especially in ways consistent with data imbalances, they judged models that preserved those differences as more fair. These findings highlight that fairness judgments are shaped not only by outcomes, but also by beliefs about the causes of disparities. We discuss implications for popular group fairness definitions and system design, arguing that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

研究の動機と目的

Infra-marginalityがAI予測の公正性判断に与える影響を調査する。
グループ間の分布差とトレーニングデータの入手可能性が公正性の認識に影響を与えるかを検討する。
医療シナリオにおけるグループ別性能の解釈と、それが公正性決定に与える関係を評価する。

提案手法

3つの候補モデルを評価するオンラインQualtricsユーザ研究（85名）で、グループ別性能を変化させて評価。
二つの処置要因：グループ別性能（7事例）とデータ入手可能性（4事例）。
参加者は、平等性と格差を維持する公正性の仮定をエンコードした3つのモデルオプションについて、7段階リッカート尺度で公正性を評価。
分布差の代理として正確さを用い、2グループの癌予測シナリオにおける infra-marginalityを反映。
理解とデータ妥当性を確保するための反復シナリオチェックとパイロットテストを実施。分析には独立標本t検定を使用。

Figure 1. Mean and standard errors of fairness perceptions on the 3 Options for the group-specific performance treatment. Group-specific accuracy denoted as (Race A and Race B) for the 7 subplots are NA/NA, 90/90, 70/70, 95/85, 75/65, 85/95, and 65/75. * signifies p $<$ 0.05, ** signifies p $<$ 0.01

実験結果

リサーチクエスチョン

RQ1分布差がある二つのグループの状況で、ユーザーはモデルの公正性をどのように評価するのか？
RQ2トレーニングデータの相対的入手可能性は、公正性判断とこれらの差異の解釈にどのように影響するか？
RQ3インフラ・マージナリティ下で、平等基準に基づく公正性と観察されたグループ差を維持するモデルのどちらをユーザーは好むか？

主な発見

グループ別性能に応じて公正性判断が変化：同等または不明な性能は平等性の好みを生み、顕在的な格差は差異を維持する公正性（インフラ・マージナリティ）を支持。
データ入手可能性は公正性推論を調整：高性能グループに対してより多くのデータがあることは自動的には公正とはみなされず、タスク難易度に結びつく格差の方が許容されやすい。
公正性判断はベースラインに対して相対的であり、元のグループ別性能へのアンカーが新モデルの評価を影響。
平等化誤差率などの平等基準（Parity-based metrics）は、分布差が分布の差を反映しているとき、 peopleの公正性認識と対立する可能性。
結果は、公正性の枠組みが distributional context を取り入れるべきで、あらゆるコストで平等を強制すべきではないことを示唆。

Figure 2. Mean and standard errors of fairness perceptions on the 3 Options when Race A $>$ Race B in group-specific performance. Subplots show data of Race A relative to Race B: no info, 3x, 20x, and 1x respectively. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。