QUICK REVIEW

[论文解读] Gradient based Feature Attribution in Explainable AI: A Technical Review

Yongjie Wang, Tong Zhang|arXiv (Cornell University)|Mar 15, 2024

Explainable Artificial Intelligence (XAI)被引用 10

一句话总结

对基于梯度的神经网络特征归因方法的全面综述，提出四组分类法，详细阐述算法进展、评估方法及关键挑战。

ABSTRACT

The surge in black-box AI models has prompted the need to explain the internal mechanism and justify their reliability, especially in high-stakes applications, such as healthcare and autonomous driving. Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives. Consequently, with an exhaustive list of papers, it becomes challenging to have a comprehensive overview of XAI research from all aspects. Considering the popularity of neural networks in AI research, we narrow our focus to a specific area of XAI research: gradient based explanations, which can be directly adopted for neural network models. In this review, we systematically explore gradient based explanation methods to date and introduce a novel taxonomy to categorize them into four distinct classes. Then, we present the essence of technique details in chronological order and underscore the evolution of algorithms. Next, we introduce both human and quantitative evaluations to measure algorithm performance. More importantly, we demonstrate the general challenges in XAI and specific challenges in gradient based explanations. We hope that this survey can help researchers understand state-of-the-art progress and their corresponding disadvantages, which could spark their interest in addressing these issues in future work.

研究动机与目标

综述聚焦于神经网络的基于梯度的可解释性方法。
提出一个分类法来对基于梯度的特征归因方法进行分类。
总结这些方法的技术细节和按时间顺序的发展演变。
评审评估策略并识别一般性及梯度特定的挑战。

提出的方法

将基于梯度的归因分为四组：vanilla gradients、integrated gradients、bias-gradient based explanations 和 denoising post-processing。
描述每组中方法的算法细节和演变。
解释用于基于梯度的解释的基线和 path integral 方法。
讨论用于归因方法的评估方法，包括人类和客观指标。
突出指引方法设计的公理性性质（例如 sensitivity、completeness、implementation invariance）。

Figure 1. Taxonomy of Explainable AI according to (Guidotti et al . , 2018b ) . In this research, we focus on gradient based explanations in feature attribution.

实验结果

研究问题

RQ1神经网络的基于梯度的特征归因的主要方法有哪些？
RQ2基于梯度的方法如何分类，每个类别内的核心技术是什么？
RQ3基于梯度的解释如何评估，标准度量是什么？
RQ4哪些一般性以及梯度特定的挑战限制了基于梯度的 XAI，未来工作在哪些方面可以改进？
RQ5integrated gradients及其变体如何解决饱和和基线选取等问题？

主要发现

引入了一种新颖的四组分类法用于基于梯度的特征归因：vanilla gradients、integrated gradients、bias gradients、和 post-processing denoising。
提供了每组中技术细节及演变的详细按时间顺序概述。
概述了广泛使用的评估指标，包括用于比较解释的人类和客观度量。
鉴定出一般的XAI挑战和梯度特定问题，以指导未来研究。
综合了基于梯度的方法与引导归因质量的公理性质之间的联系。

Figure 2. Taxonomy of gradient based feature attribution.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。