QUICK REVIEW

[论文解读] Explaining Predictive Uncertainty with Information Theoretic Shapley Values

David Watson, Joshua O'Hara|arXiv (Cornell University)|Jun 9, 2023

Explainable Artificial Intelligence (XAI)被引用 14

一句话总结

本文将 Shapley 值用于解释预测不确定性，使用信息理论价值函数，使模型无关的熵分量归因，并提供有限样本推断保证。

ABSTRACT

Researchers in explainable artificial intelligence have developed numerous methods for helping users understand the predictions of complex supervised learning models. By contrast, explaining the $ extit{uncertainty}$ of model outputs has received relatively little attention. We adapt the popular Shapley value framework to explain various types of predictive uncertainty, quantifying each feature's contribution to the conditional entropy of individual model outputs. We consider games with modified characteristic functions and find deep connections between the resulting Shapley values and fundamental quantities from information theory and conditional independence testing. We outline inference procedures for finite sample error rate control with provable guarantees, and implement efficient algorithms that perform well in a range of experiments on real and simulated data. Our method has applications to covariate shift detection, active learning, feature selection, and active feature-value acquisition.

研究动机与目标

将 Shapley 值归因扩展到解释模型输出的不确定性，而不仅仅是点预测。
引入捕捉条件熵与信息增益的信息理论价值函数。
提供带误差率保证的有限样本 conformal 推断用于归因分数。
开发模型特定与模型无关的变体，并在仿真和真实数据上进行验证。
展示在协变量移位检测、主动学习和特征选择中的应用。

提出的方法

将 Shapley 值与基于 KL、CE 和熵变体的修改后价值函数结合，用以量化特征对局部不确定性的贡献。
定义并将信息理论归因与条件独立性及上下文特定独立性联系起来。
提供一种分裂 conformal 推断程序，以保证 Shapley 值的有限样本覆盖率。
通过插件熵估计，使用基于集成的不确定性估计（epistemic/aleatoric）来得到总不确定性、认知不确定性与本体不确定性。
实现模型特定（TreeSHAP/DeepSHAP）和模型无关的变体，并在仿真和真实数据上评估。

Figure 1: A . MNIST examples. We highlight pixels that increase (red) and decrease (blue) predictive uncertainty in digit classification tasks (1 vs. 7, 3 vs. 8, and 4 vs. 9). B . Reviews from the IMDB dataset, with tokens colored by their relative contribution to the entropy of sentiment prediction

实验结果

研究问题

RQ1如何重新表述 Shapley 值以对预测不确定性而非点预测进行归因？
RQ2提出的基于信息理论的 Shapley 值变体（KL、CE、IG、H）的信息理论解释和性质是什么？
RQ3是否可以通过 conformal 推断为归因分数建立有限样本覆盖保证？
RQ4在实际任务如协变量移位检测、主动学习和特征选择中，这些不确定性归因的性能如何？

主要发现

信息理论 Shapley 值将比特信息归属给特征，以解释给定 X 的 Y 的不确定性。
KL 和 CE 的价值函数在 Shapley 值上相等，差一个加法常数，相当于局部后验与先验之间的散度度量。
IG 和 H 的价值函数量化局部信息增益和条件熵贡献，与互信息和条件独立性概念相关。
分裂 conformal 推断程序为检验某特征的归因是否集中在接近零处提供有限样本覆盖保证。
在图像、文本和表格数据上的实验显示对认知/本体不确定性的有意义归因，以及对协变量移位检测和主动学习的有效性。

Figure 2: A . Mean absolute error (MAE) as a function of sample size, with autocorrelation fixed at $\rho=0.5$ . B . MAE as a function of autocorrelation with sample size fixed at $n=2000$ . Shading represents standard errors across 50 replicates.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。