QUICK REVIEW

[논문 리뷰] Comparative Separation: Evaluating Separation on Comparative Judgment Test Data

Xiaoyin Xi, Neeku Capak|arXiv (Cornell University)|2026. 01. 11.

Ethics and Social Impacts of AI인용 수 0

한 줄 요약

논문은 비교적 분리(comparative separation)를 정의하고 이를 이진 분류의 분리와 동등하다고 증명하며, 비교 판단을 사용한 공정성 평가를 위한 통계적 검정 및 파워 분석을 개발한다. 이론을 시뮬레이션 및 실제 데이터세트로 검증한다.

ABSTRACT

This research seeks to benefit the software engineering society by proposing comparative separation, a novel group fairness notion to evaluate the fairness of machine learning software on comparative judgment test data. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive groups -- satisfying the separation criterion. However, evaluation of separation requires ground truth labels for each test data point. This motivates our work on analyzing whether separation can be evaluated on comparative judgment test data. Instead of asking humans to provide the ratings or categorical labels on each test data point, comparative judgments are made between pairs of data points such as A is better than B. According to the law of comparative judgment, providing such comparative judgments yields a lower cognitive burden for humans than providing ratings or categorical labels. This work first defines the novel fairness notion comparative separation on comparative judgment test data, and the metrics to evaluate comparative separation. Then, both theoretically and empirically, we show that in binary classification problems, comparative separation is equivalent to separation. Lastly, we analyze the number of test data points and test data pairs required to achieve the same level of statistical power in the evaluation of separation and comparative separation, respectively. This work is the first to explore fairness evaluation on comparative judgment test data. It shows the feasibility and the practical benefits of using comparative judgment test data for model evaluations.

연구 동기 및 목표

실제 정답 레이블이 비용이 많이 들거나 신뢰할 수 없는 상황에서 ML에서 공정성 평가의 필요성을 제시한다.
비교 판단에 기초한 공정성의 개념으로 comparative separation를 도입한다.
comparative separation와 이진 분류의 분리 간의 이론적 동등성을 증명한다.
분리와 comparative separation를 평가하기 위한 가설 검정 및 파워 분석을 개발한다.

제안 방법

pairwise comparative judgments between data points에 기반한 comparative separation 정의
동등성 입증: comparative separation가 충족되면 이진 분류에서의 표준 분리도 성립한다는 것(Theorem 3.3)
pairwise 데이터(TPR 및 관련 양자)를 사용한 분리 및 comparative separation에 대한 지표 및 통계 검정 제안
동일한 통계적 파워를 달성하기 위해 binary 설정에서 비교적 분리가 약 두 배의 테스트 쌍 평가를 필요로 한다는 파워 분석 제시
comparative judgments를 통해 분류 및 회귀 맥락 모두에 대한 평가 프레임워크 확장
시뮬레이션 및 소프트웨어 엔지니어링의 실제 공정성 데이터셋을 통한 발견 점검

실험 결과

연구 질문

RQ1RQ1: comparative separation가 이진 분류의 분리와 동등한가?
RQ2RQ2: 이진 분류기의 분리 또는 comparative separation가 충족되었는지 통계적으로 어떤 검정을 사용할 수 있는가?
RQ3RQ3: 원하는 통계적 파워를 얻기 위해 필요한 테스트 데이터 포인트 또는 쌍의 수는 얼마인가?

주요 결과

Comparative separation는 이론적으로 이진 분류 설정의 분리와 동등하다는 것이 입증된다(Theorem 3.3).
분리 및 comparative separation에 대한 통계 검정은 두 개의 귀무가설에 의존하며, α = 0.05일 때 공유된 1종 오류(타입 I)율은 0.0975이다.
comparative separation는 동일한 통계적 파워를 달성하기 위해 이진 분류에서의 분리에 비해 약 두 배의 테스트 데이터 쌍이 필요하다(Section 3.4.2).
논문은 제2형 오류 비율 및 필요한 샘플 크기를 추정하는 파워 분석 공식 및 명제(Propositions 3.4 및 3.5)를 제공한다.
시뮬레이션 및 실제 공정성 데이터셋을 통한 실증은 이론적 결과를 뒷받침하고 비교 판단을 활용한 공정성 평가의 실현 가능성을 보여준다.
실험 코드 및 데이터는 GitHub에서 공개되어 있다 (https://github.com/hil-se/Comparative_Separation).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.