QUICK REVIEW

[论文解读] Towards better understanding of gradient-based attribution methods for Deep Neural Networks

Marco Ancona, Enea Ceolini|arXiv (Cornell University)|Nov 16, 2017

Adversarial Robustness in Machine Learning被引用 300

一句话总结

本文分析四种基于梯度的归因方法（Gradient*Input、epsilon-LRP、Integrated Gradients、DeepLIFT），并给出理论联系，提出统一框架，并引入 Sensitivity-n 在不同数据集和架构下评估归因质量。

ABSTRACT

Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work, we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image and text classification, using various network architectures.

研究动机与目标

在不同架构和任务中，推动对DNN预测给出有原理、可对比的解释的需求。
形式化地关联并统一基于梯度的归因方法，以实现直接比较和实现。
引入 Sensitivity-n，用以量化在特征子集条件下归因和输出变化之间的关系。
在图像和文本数据集上对方法进行实证比较，以揭示理论与实践层面的见解。

提出的方法

将 epsilon-LRP 和 DeepLIFT 重新表述为带有修改梯度函数的反向传播，以创建一个统一的基于梯度的框架。
证明等价性结果：在 ReLU 激活下，epsilon-LRP 对应于 Gradient*Input；在没有偏置的网络且具有某些非线性性时，DeepLIFT（Rescale）与 Gradient*Input 对齐。
展示 Integrated Gradients 与 DeepLIFT 如何通过平均梯度与局部梯度相关，并讨论对乘法交互的影响。
定义并使用 Sensitivity-n 来评估在移除特征子集时归因和输出变化之间的一致性。
提供在现代基于计算图的框架（如 TensorFlow）中实现这些方法的实际指南，而无需自定义层。

实验结果

研究问题

RQ1在什么条件下，基于梯度的归因方法是等价的或彼此的近似？
RQ2统一框架是否能够促进归因方法的直接比较和更简单的实现？
RQ3我们如何超越定性热力图，定量评估归因方法？
RQ4针对非线性或乘法交互（如 LSTMs）的基于梯度的归因有哪些局限？

主要发现

epsilon-LRP 和 DeepLIFT 可以重新表述为带修改梯度的反向传播，从而实现统一框架。
在 ReLU 激活下，epsilon-LRP 等价于 Gradient*Input；在没有偏置且具有跨越原点的某些非线性时，等价于 DeepLIFT（用零基线）。
Integrated Gradients 与 DeepLIFT 通常关系密切，实际中 DeepLIFT 能较好近似 Integrated Gradients，尽管理解相乘交互可能导致发散。
Occlusion-1 仍然是一个强力的局部归因方法，满足 Sensitivity-1；而基于梯度的方法在捕捉全局非线性效应方面更佳。
所有方法都产生带符号的归因；输入可能包含负证据，线性模型使所有方法等价（Sensitivity-n 对所有 n 成立）。
提出的 Sensitivity-n 指标使在特征子集上归因和输出变化之间的系统比较成为可能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。