[论文解读] Axiomatic Attribution for Deep Networks
本文为深度网络的特征归因定义了两个公理——Sensitivity 与 Implementation Invariance,并引入 Integrated Gradients,一种基于梯度的方法,满足这些公理和 Completeness,同时给出其在路径方法中的唯一性论证。
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.
研究动机与目标
- Motivate the attribution problem: assigning prediction credit to input features for deep networks.
- Propose two fundamental axioms (Sensitivity and Implementation Invariance) for attribution methods.
- Design a new attribution method that satisfies the axioms and is easy to implement.
提出的方法
- Define Integrated Gradients as the path integral of gradients along the straightline path from a baseline input to the input.
- Prove Completeness: the sum of attributions equals F(x) - F(x').
- Show that Integrated Gradients satisfy Sensitivity(a) and Implementation Invariance (and hence Sensitivity).
- Argue for Path Methods as the only class satisfying key axioms, with Integrated Gradients being the canonical member for straightline paths.
- Discuss how to approximate the integral with a finite number of gradient evaluations (steps m).
- Provide guidance on choosing baselines and practical computation considerations.
实验结果
研究问题
- RQ1What attribution properties should an explanation method satisfy for deep networks?
- RQ2Can we design an attribution method that is both implementation invariant and sensitive to input changes?
- RQ3Is there a canonical gradient-based attribution method that satisfies core axioms and is practical to compute?
主要发现
- Integrated Gradients provide attributions that sum to the difference F(x) - F(x′) (Completeness).
- The method satisfies Sensitivity and Implementation Invariance, addressing weaknesses of prior approaches.
- Path methods are uniquely capable of satisfying the stated axioms, with Integrated Gradients being the canonical straightline-path instantiation.
- The baseline is a crucial component for meaningful attributions, and practical approximation uses a finite number of gradient evaluations along the path.
- Integrated Gradients can be efficiently computed with 20 to 300 gradient evaluations and apply across image, text, and chemistry models.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。