QUICK REVIEW

[论文解读] Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly|arXiv (Cornell University)|Mar 4, 2017

Explainable Artificial Intelligence (XAI)参考文献 29被引用 2,626

一句话总结

本文为深度网络的特征归因定义了两个公理——Sensitivity 与 Implementation Invariance，并引入 Integrated Gradients，一种基于梯度的方法，满足这些公理和 Completeness，同时给出其在路径方法中的唯一性论证。

ABSTRACT

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

研究动机与目标

Motivate the attribution problem: assigning prediction credit to input features for deep networks.
Propose two fundamental axioms (Sensitivity and Implementation Invariance) for attribution methods.
Design a new attribution method that satisfies the axioms and is easy to implement.

提出的方法

Define Integrated Gradients as the path integral of gradients along the straightline path from a baseline input to the input.
Prove Completeness: the sum of attributions equals F(x) - F(x').
Show that Integrated Gradients satisfy Sensitivity(a) and Implementation Invariance (and hence Sensitivity).
Argue for Path Methods as the only class satisfying key axioms, with Integrated Gradients being the canonical member for straightline paths.
Discuss how to approximate the integral with a finite number of gradient evaluations (steps m).
Provide guidance on choosing baselines and practical computation considerations.

实验结果

研究问题

RQ1What attribution properties should an explanation method satisfy for deep networks?
RQ2Can we design an attribution method that is both implementation invariant and sensitive to input changes?
RQ3Is there a canonical gradient-based attribution method that satisfies core axioms and is practical to compute?

主要发现

Integrated Gradients provide attributions that sum to the difference F(x) - F(x′) (Completeness).
The method satisfies Sensitivity and Implementation Invariance, addressing weaknesses of prior approaches.
Path methods are uniquely capable of satisfying the stated axioms, with Integrated Gradients being the canonical straightline-path instantiation.
The baseline is a crucial component for meaningful attributions, and practical approximation uses a finite number of gradient evaluations along the path.
Integrated Gradients can be efficiently computed with 20 to 300 gradient evaluations and apply across image, text, and chemistry models.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。