Skip to main content
QUICK REVIEW

[论文解读] Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly|arXiv (Cornell University)|Mar 4, 2017
Explainable Artificial Intelligence (XAI)参考文献 29被引用 2,626
一句话总结

本文为深度网络的特征归因定义了两个公理——Sensitivity 与 Implementation Invariance,并引入 Integrated Gradients,一种基于梯度的方法,满足这些公理和 Completeness,同时给出其在路径方法中的唯一性论证。

ABSTRACT

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

研究动机与目标

  • Motivate the attribution problem: assigning prediction credit to input features for deep networks.
  • Propose two fundamental axioms (Sensitivity and Implementation Invariance) for attribution methods.
  • Design a new attribution method that satisfies the axioms and is easy to implement.

提出的方法

  • Define Integrated Gradients as the path integral of gradients along the straightline path from a baseline input to the input.
  • Prove Completeness: the sum of attributions equals F(x) - F(x').
  • Show that Integrated Gradients satisfy Sensitivity(a) and Implementation Invariance (and hence Sensitivity).
  • Argue for Path Methods as the only class satisfying key axioms, with Integrated Gradients being the canonical member for straightline paths.
  • Discuss how to approximate the integral with a finite number of gradient evaluations (steps m).
  • Provide guidance on choosing baselines and practical computation considerations.

实验结果

研究问题

  • RQ1What attribution properties should an explanation method satisfy for deep networks?
  • RQ2Can we design an attribution method that is both implementation invariant and sensitive to input changes?
  • RQ3Is there a canonical gradient-based attribution method that satisfies core axioms and is practical to compute?

主要发现

  • Integrated Gradients provide attributions that sum to the difference F(x) - F(x′) (Completeness).
  • The method satisfies Sensitivity and Implementation Invariance, addressing weaknesses of prior approaches.
  • Path methods are uniquely capable of satisfying the stated axioms, with Integrated Gradients being the canonical straightline-path instantiation.
  • The baseline is a crucial component for meaningful attributions, and practical approximation uses a finite number of gradient evaluations along the path.
  • Integrated Gradients can be efficiently computed with 20 to 300 gradient evaluations and apply across image, text, and chemistry models.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。