QUICK REVIEW

[论文解读] The Sensitivity of Counterfactual Fairness to Unmeasured Confounding

Niki Kilbertus, Philip Ball|arXiv (Cornell University)|Jul 1, 2019

Ethics and Social Impacts of AI参考文献 23被引用 20

一句话总结

本文引入敏感性分析工具，以评估未观测混杂因素如何影响因果机器学习模型中的反事实公平性，特别是在非线性加性噪声模型（ANMs）中。针对双变量混杂，提出一种基于网格的计算方法；针对多变量情形，提出一种基于自动微分的优化方法，结果表明即使因果图接近正确，混杂因素仍可能显著改变公平性度量。

ABSTRACT

Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of non-linear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closed-form updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding.

研究动机与目标

解决因果公平性研究中的关键空白：即假设因果图已知且正确，但这一假设常因未观测混杂而失效。
开发工具以量化未观测混杂因子对现实预测系统中反事实公平性的影响程度。
提供一种系统化方法，评估在合理模型误设下公平性标准的稳健性。
将敏感性分析从平均处理效应（ATE）扩展至个体层面的公平性度量，如反事实公平性。

提出的方法

将未观测混杂建模为非线性加性噪声模型（ANMs）中误差项之间的协方差，以表示隐藏的共同原因。
针对双变量混杂，采用相关系数值的网格搜索，计算反事实公平性最大变化量，在具有非线性基函数的线性模型下可实现闭式更新。
针对多变量混杂，将问题建模为受正定协方差矩阵约束的优化任务，可通过自动微分求解。
使用结构方程建模保护属性、结果变量与中介变量之间的关系，误差项捕捉未观测到的混杂因素。
通过比较误差项相关（混杂模型）与独立（无混杂模型）下的反事实预测，计算公平性影响。
引入一种新度量指标——CFU（受混杂影响下的反事实公平性），用于量化混杂导致的最坏情况公平性偏差。

实验结果

研究问题

RQ1即使因果图基本正确，未观测混杂如何影响预测模型的反事实公平性？
RQ2在误差项中给定水平的未观测混杂下，反事实公平性可能遭受的最大退化程度是多少？
RQ3能否在双变量与多变量设置下高效计算混杂条件下的最坏情况公平性违规？
RQ4不同水平与符号的混杂相关性如何影响现实数据集中的公平性度量？
RQ5所提出的敏感性分析工具与假设无混杂或使用任意误差结构的基线方法相比表现如何？

主要发现

在法学院数据集中，CFU随混杂强度增加而上升，在中等相关性（p_max ≈ 0.5）时达到峰值，随后在高p_max时再次上升，可能由于数值不稳定性所致。
在NHS员工调查数据集中，CFU呈现相似趋势：小p_max时上升，中等范围时趋于平稳，高p_max时再次上升，且所有值均低于基线方法。
针对双变量混杂的网格法在具有非线性基函数的线性模型下提供快速、闭式解，可实现高效的敏感性检验。
基于自动微分的优化方法成功识别出在网格搜索不可行的多变量ANMs中的最坏情况公平性违规。
两种基线方法（假设误差独立或使用任意误差结构）的CFU值始终高于所提方法，表明新工具更具保守性与可靠性。
研究发现，在低混杂水平下，p_max的微小变化即可引起CFU的大幅跃升，表明早期阶段对模型误设高度敏感。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。