[论文解读] Explaining with Impact: A Machine-centric Strategy to Quantify the Performance of Explainability Algorithms
本文提出一种以机器为中心的策略,通过两个指标定量评估深度神经网络中的可解释性方法:影响得分(衡量关键因素导致的置信度或决策变化)和影响覆盖率(衡量对抗性扰动因素的覆盖程度)。在ResNet-50模型上,GSInquire展现出最高的决策影响(76%),其次为期望梯度(51%)、SHAP(44%)和LIME(38%),表明可解释性方法在性能上存在可度量的层级关系。
There has been a significant surge of interest recently around the concept of explainable artificial intelligence (XAI), where the goal is to produce an interpretation for a decision made by a machine learning algorithm. Of particular interest is the interpretation of how deep neural networks make decisions, given the complexity and `black box' nature of such networks. Given the infancy of the field, there has been very limited exploration into the assessment of the performance of explainability methods, with most evaluations centered around subjective visual interpretation of the produced interpretations. In this study, we explore a more machine-centric strategy for quantifying the performance of explainability methods on deep neural networks via the notion of decision-making impact analysis. We introduce two quantitative performance metrics: i) Impact Score, which assesses the percentage of critical factors with either strong confidence reduction impact or decision changing impact, and ii) Impact Coverage, which assesses the percentage coverage of adversarially impacted factors in the input. A comprehensive analysis using this approach was conducted on several state-of-the-art explainability methods (LIME, SHAP, Expected Gradients, GSInquire) on a ResNet-50 deep convolutional neural network using a subset of ImageNet for the task of image classification. Experimental results show that the critical regions identified by LIME within the tested images had the lowest impact on the decision-making process of the network (~38%), with progressive increase in decision-making impact for SHAP (~44%), Expected Gradients (~51%), and GSInquire (~76%). While by no means perfect, the hope is that the proposed machine-centric strategy helps push the conversation forward towards better metrics for evaluating explainability methods and improve trust in deep neural networks.
研究动机与目标
- 为深度神经网络中的可解释性算法缺乏客观、定量的评估方法提供解决方案。
- 超越主观的视觉解释,引入基于模型行为的以机器为中心的度量标准。
- 量化可解释性方法识别对模型决策具有显著影响因素的有效性。
- 通过测量其对模型置信度和预测的影响,评估显著图的鲁棒性与相关性。
- 建立一个基于可测量、行为驱动标准的基准,用于比较最先进可解释性方法。
提出的方法
- 引入影响得分度量,量化当关键因素被掩码时,导致置信度显著下降或决策改变的关键因素所占的百分比。
- 将影响覆盖率定义为输入中被对抗性扰动的关键因素里,被解释方法正确识别的部分所占的百分比。
- 将这些度量应用于在ImageNet子集上训练的ResNet-50模型上,评估LIME、SHAP、期望梯度和GSInquire。
- 使用对抗性扰动识别显著影响模型预测的关键输入区域。
- 测量每种方法识别出的高显著性区域被掩码后对模型置信度和最终预测的影响。
- 结合两种度量,评估解释方法在影响模型决策方面的影响程度与完整性。
实验结果
研究问题
- RQ1不同可解释性方法在识别显著改变模型置信度的因素方面表现如何?
- RQ2解释方法在多大程度上覆盖了那些在扰动后会改变模型预测的关键输入区域?
- RQ3基于机器的评估策略能否提供更客观、可量化的基准,用于比较可解释性方法?
- RQ4LIME、SHAP、期望梯度和GSInquire的解释在决策影响方面有何差异?
- RQ5显著性解释与对抗性扰动下实际模型行为之间存在何种关系?
主要发现
- LIME识别关键区域的决策影响最低,影响得分为约38%。
- SHAP的影响得分为约44%,表明在掩码关键区域时对模型置信度有更强影响。
- 期望梯度的影响得分为约51%,表明对关键输入因素具有更高的敏感性。
- GSInquire的影响得分为76%,表明其对模型置信度和决策结果的影响最强。
- 影响覆盖率用于评估各方法捕捉对抗性扰动因素的能力,有助于整体评估解释质量。
- 结果表明可解释性方法之间存在性能层级,GSInquire在对模型决策的可测量影响方面优于LIME、SHAP和期望梯度。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。