QUICK REVIEW

[论文解读] Measurable Counterfactual Local Explanations for Any Classifier

Adam White, Artur d’Avila Garcez|arXiv (Cornell University)|Aug 8, 2019

Explainable Artificial Intelligence (XAI)参考文献 20被引用 54

一句话总结

CLEAR 通过提供 b-反事实和一个对底层分类器的测量忠实度的回归本地模型来解释预测，在五个数据集上比 LIME 有所改进。

ABSTRACT

We propose a novel method for explaining the predictions of any classifier. In our approach, local explanations are expected to explain both the outcome of a prediction and how that prediction would change if 'things had been different'. Furthermore, we argue that satisfactory explanations cannot be dissociated from a notion and measure of fidelity, as advocated in the early days of neural networks' knowledge extraction. We introduce a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary. A system called CLEAR: Counterfactual Local Explanations via Regression, is introduced and evaluated. CLEAR generates w-counterfactual explanations that state minimum changes necessary to flip a prediction's classification. CLEAR then builds local regression models, using the w-counterfactuals to measure and improve the fidelity of its regressions. By contrast, the popular LIME method, which also uses regression to generate local explanations, neither measures its own fidelity nor generates counterfactuals. CLEAR's regressions are found to have significantly higher fidelity than LIME's, averaging over 45% higher in this paper's four case studies.

研究动机与目标

通过关注反事实和忠实度来激励在关键领域对预测的可信解释。
定义并量化局部解释对底层分类器的忠实度。
开发 CLEAR 以生成 b-counterfactuals 并构建反映局部决策边界的回归模型。
证明 CLEAR 在多个数据集上比 LIME 具有更高的忠实度。

提出的方法

将 b-counterfactual 摆扰定义为能够翻转预测类别的最小特征变化。
在感兴趣的实例周围生成带标签的合成观测。
构建一个跨越决策边界邻域的平衡邻域。
拟合一个通过该实例的局部回归模型（可包含二次项和交互项）。
使用回归估计 b-扰动并计算相对于真实 b-扰动的忠实度误差。
迭代调整回归规范，并可选地添加加权的 b-counterfactuals 以提高忠实度。

实验结果

研究问题

RQ1如何生成基于回归本地模型的反事实解释？
RQ2对分类器的忠实度度量能否提升局部解释的可信度？
RQ3将 b-counterfactuals 纳入邻域是否比像 LIME 这样的现有方法提高局部解释的忠实度？
RQ4哪些配置选择（平衡邻域、中心化、二次/交互项）在不同数据集上能最大化忠实度？

主要发现

数据集	CLEAR（不使用 b-counterfactuals）忠实度	CLEAR（使用 b-counterfactuals）忠实度	LIME 忠实度（基线）
Adult	80% ± 0.9	80% ± 0.8	26% ± 0.6
Iris	80% ± 1.0	99.8% ± 0.1	30% ± 0.3
Pima	57% ± 0.8	77% ± 0.8	20% ± 0.4
Credit	39% ± 1.3	55% ± 1.7	12% ± 0.5
Breast	54% ± 1.1	81% ± 1.3	14% ± 0.3

CLEAR 在五个数据集中的忠实度持续优于 LIME，平均忠实度高出约 40%。
使用平衡的邻域、中心化（通过 x 的回归）以及包含二次项和交互项可获得更高的忠实度。
在邻域中包含 b-counterfactuals 进一步提高忠实度。
CLEAR 的忠实度比仅仅的分类准确性更严格，揭示了 LIME 解释中的缺口。
最佳配置因数据集而异（例如按数据集的逻辑回归 vs 多元回归）。
一个 CLEAR 原型提供可解释的报告，具有可调复杂度以平衡忠实度和可解释性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。