QUICK REVIEW

[论文解读] 50 Years of Test (Un)fairness: Lessons for Machine Learning

Ben Hutchinson, Margaret Mitchell|arXiv (Cornell University)|Nov 25, 2018

Ethics and Social Impacts of AI参考文献 65被引用 126

一句话总结

该论文回顾了教育与就业测试领域五十年的公平性研究，将历史公平标准映射到现代机器学习的概念，并主张利用历史洞见来引导机器学习的公平性定义和实践。

ABSTRACT

Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

研究动机与目标

追踪测试中公平定义在1960年代至1980年代的演变及其社会背景。
识别历史公平标准与当代ML公平概念之间的对应关系。
突出关于公平标准适用于测试本身还是测试使用的情境，以及对ML的含义的经验教训。
讨论在公平性中的因果与效用考量及其对ML实践的相关性。

提出的方法

对教育与就业测试公平性定义的历史文献综述，覆盖 Cleary (1966) 至 Peterson & Novick (1976) 及其后续研究。
将测试公平性标准映射到现代ML的独立性概念（充分性、分离、独立性）以及因果/效用解释。
讨论回归、相关性和DIF分析如何与当前ML公平性方法相关。
在测试和ML情境中对比模型中心的公平性与使用中心的公平性视角。

实验结果

研究问题

RQ1历史上在教育与就业测试中提出了哪些公平标准，它们如何与现代ML公平定义相关？
RQ2测试中心的公平性与使用中心的公平性在何种意义上不同，它们为ML实践提供了哪些经验？
RQ3历史标准在多大程度上映射到ML概念，如等化机会、预测性等价性与人口统计学等价性？
RQ4将回归和相关性作为ML模型的公平性标准的含义是什么？

主要发现

许多历史公平标准与现代ML定义（如充分性、等化机会、预测性等价性和人口统计学等价性）相吻合。
在个体公平和群体公平之间存在根本的张力，类似于ML中的不可能性定理。
公平性标准通常取决于公平性是测试本身的属性还是其使用的属性，这一区分在ML关于模型与使用的讨论中也有所呼应。
差异项功能性（DIF）和测试使用中的制度变迁影响了后续ML偏差缓解方法。
因果与效用导向的视角是早期的考量，预示了ML中关于公平成本分析的当代表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。