QUICK REVIEW

[论文解读] The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Satyapriya Krishna, Tessa Han|arXiv (Cornell University)|Feb 3, 2022

Explainable Artificial Intelligence (XAI)被引用 42

一句话总结

这篇论文将事后解释之间的分歧形式化并量化，将其在不同数据集和模型中的普遍性实证化，并显示从业者缺乏解决此类分歧的原则性方法。

ABSTRACT

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of whether and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we formalize and study the disagreement problem in explainable machine learning. More specifically, we define the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

研究动机与目标

定义从业者所观察到的不同方法产生的局部解释之间的分歧的含义。
开发一个定量框架，用以衡量同一预测的两种解释之间的分歧。
在真实世界的数据集、模型和解释方法上进行经验量化分歧。
通过用户研究调查数据科学家在实际中如何解决分歧。
指出对评估指标和从业者教育的影响。

提出的方法

对25位数据科学家进行半结构化访谈，以界定解释分歧的构成要素。
使用六个度量指标形式化解释分歧，聚焦前k个特征的重叠、排序和符号/方向对齐。
在四个真实世界数据集（表格、文本、图像模态）上训练和评估六种事后解释方法（LIME、KernelSHAP、Vanilla Gradient、Gradient*Input、Integrated Gradients、SmoothGrad）。
使用四种模型家族（表格数据：逻辑回归、前馈神经网络、随机森林、梯度提升树；文本：LSTM；图像：ResNet-18）。
应用这六个分歧度量来比较解释，并研究分歧如何随k和模型复杂性变化。

实验结果

研究问题

RQ1当代最前沿的事后解释方法在同一预测上的解释有多频繁地不一致？
RQ2从业者将哪些方面视为分歧（前k个特征、排序、符号和相对特征重要性）？
RQ3我们能否在一个通用框架中形式化并量化解释之间的分歧？
RQ4从业者在实践中如何解决分歧，他们报告了哪些策略？

主要发现

受访的数据科学家中有84%报告在工作流程中遇到解释之间的分歧。
86% 的在线研究参与者依赖任意启发式方法或不知道如何解决分歧。
Grad-SmoothGrad和Grad*Input-IntGrad趋于一致，而Grad-IntGrad、Grad-Grad*Input、SmoothGrad-Grad*Input和SmoothGrad-IntGrad趋于不一致，指示基于梯度的方法之间的二分化。
分歧在模型类别和数据模态之间往往持续存在，在特征更多的数据集（如 German Credit）和更复杂的模型上观察到更强的分歧。
分歧在top-k 增大时增加，降低排序一致性和有符号排序一致性，凸显对特征排序和符号的敏感性。
用户强调特征重要性值（如 LIME 与 SHAP）并不直接可比，但在前几个特征及其顺序上期望得到一致的见解。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。