[论文解读] Do Explanations Reflect Decisions? A Machine-centric Strategy to Quantify the Performance of Explainability Algorithms
本文提出一个以机器为中心的框架(Impact Score 与 Impact Coverage),用于量化评估解释性方法(LIME、SHAP、Expected Gradients、GSInquire)实际对神经网络决策的影响程度,在 ResNet-50、ImageNet 数据上,在普通和对抗条件下进行评估。
There has been a significant surge of interest recently around the concept of explainable artificial intelligence (XAI), where the goal is to produce an interpretation for a decision made by a machine learning algorithm. Of particular interest is the interpretation of how deep neural networks make decisions, given the complexity and `black box' nature of such networks. Given the infancy of the field, there has been very limited exploration into the assessment of the performance of explainability methods, with most evaluations centered around subjective visual interpretation of the produced interpretations. In this study, we explore a more machine-centric strategy for quantifying the performance of explainability methods on deep neural networks via the notion of decision-making impact analysis. We introduce two quantitative performance metrics: i) Impact Score, which assesses the percentage of critical factors with either strong confidence reduction impact or decision changing impact, and ii) Impact Coverage, which assesses the percentage coverage of adversarially impacted factors in the input. A comprehensive analysis using this approach was conducted on several state-of-the-art explainability methods (LIME, SHAP, Expected Gradients, GSInquire) on a ResNet-50 deep convolutional neural network using a subset of ImageNet for the task of image classification. Experimental results show that the critical regions identified by LIME within the tested images had the lowest impact on the decision-making process of the network (~38%), with progressive increase in decision-making impact for SHAP (~44%), Expected Gradients (~51%), and GSInquire (~76%). While by no means perfect, the hope is that the proposed machine-centric strategy helps push the conversation forward towards better metrics for evaluating explainability methods and improve trust in deep neural networks.
研究动机与目标
- 推动对解释性方法进行超越主观视觉解释的以机器为中心的定量评估。
- 定义度量标准(Impact Score 与 Impact Coverage)来衡量识别出的关键因素对网络决策与置信度的影响。
- 系统地比较在普通和对抗条件下的图像分类任务中,最先进的解释方法(LIME、SHAP、Expected Gradients、GSInquire)。
提出的方法
- 把解释性方法 M 识别的关键因素 c 定义为重要,当任一条件成立时:(i) 移除 c 会改变决策,或 (ii) 决策置信度 z 降低至阈值 tau(0.5)或以上。
- 将影响分数 I 定义为在输入上的指示器的平均值,若决策在没有 c 时改变或置信度下降了 tau。
- 使用只考虑决策变化的更严格的 I_strict(不考虑置信度标准)。
- 将对抗性影响覆盖 I_coverage 定义为跨输入的对抗性影响因素与关键因素之间的交并比的平均值。
- 在 ResNet-50 ImageNet 子集上,对四种解释方法(LIME、SHAP、Expected Gradients、GSInquire)在一般条件和对抗性贴片条件下进行评估。
- 利用 I、I_strict 和 I_coverage 比较方法,以评估一般和对抗性能。
实验结果
研究问题
- RQ1不同解释性方法识别的关键因素在多大程度上反映了神经网络的实际决策过程?
- RQ2更新的、更多梯度信息的方法(如 GSInquire、Expected Gradients)是否比代理方法(LIME、SHAP)在决策影响和置信度影响方面提供更高的解释?
- RQ3在对抗性干扰下,解释性方法在决策影响和对抗性影响因子覆盖方面的表现如何?
主要发现
- GSInquire 在一般情景中对决策的影响最高(I 约为 76.10%),对置信度的影响也相当显著(I_strict 约 50.73%),优于其他方法。
- Expected Gradients 在一般情景中的影响强于 SHAP,I 约 51.22%,I_strict 约 47.80%。
- SHAP 相较于 LIME 有所提升,但在决策影响方面仍低于 GSInquire 与 Expected Gradients(I 约 44.15%,I_strict 约 40.24%)。
- 在一般情景中,LIME 的影响指标最低(I 约 38.05%,I_strict 约 35.12%)。
- 在对抗性干扰下,LIME 显示出最低的 I、I_strict 和 I_coverage,而 GSInquire 在贴片尺度上获得最高的 I、I_strict 和 I_coverage,表明在识别对抗性影响区域方面更出色。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。