Skip to main content
QUICK REVIEW

[论文解读] Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Huiqi Deng, Na Zou|arXiv (Cornell University)|Mar 2, 2023
Multi-Criteria Decision Making被引用 18
一句话总结

本文将十四种输入归因方法统一到 Taylor 交互框架,显示每个归因是独立效应与交互效应的加权分配,并提出保真度原则。

ABSTRACT

Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output. However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this end, for the first time, we formulate core mechanisms of fourteen attribution methods, which were designed on different heuristics, into the same mathematical system, i.e., the system of Taylor interactions. Specifically, we prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects, i.e., independent effects of each individual input variable and interaction effects between input variables. The essential difference among the fourteen attribution methods mainly lies in the weights of allocating different effects. Based on the above findings, we propose three principles for a fair allocation of effects to evaluate the faithfulness of the fourteen attribution methods.

研究动机与目标

  • 激励需要对深度神经网络(DNN)的归因方法进行统一理论理解。
  • 提出一个基于 Taylor 交互的系统,将网络输出分解为独立效应和交互效应。
  • 证明现有的十四种归因方法可以在该统一框架内重新表述。
  • 提出公平性原则以评估方法将效应忠实地分配到输入变量的程度。

提出的方法

  • 将 DNN 输出围绕基线进行 Taylor 展开来表述:f(x) = f(b) + sum of independent effects phi(kappa) and interaction effects I(kappa).
  • 定义通用独立效应 psi(i) 和通用交互效应 J(S) 为对相应 Taylor 项的和。
  • 证明 Harsanyi dividend H(S) 等于通用交互效应 J(S)。
  • Demonstrate that every attribution a_i can be written as a weighted sum of independent and interaction effects: a_i = sum_j w_{i,j} psi(j) + sum_S w_{i,S} J(S).
  • 提供从十四种归因方法到该框架的统一映射,给出每种方法的分配权重的显式表达。

实验结果

研究问题

  • RQ1十四种归因方法是否能够在单一的 Taylor 交互框架下进行理论统一?
  • RQ2独立效应和交互效应如何影响输入归因,以及不同方法如何分配这些效应?
  • RQ3哪些原则能确保将独立效应和交互效应忠实地分配给输入变量?
  • RQ4如何将现有归因方法表达为对 Taylor 独立效应和交互效应的再分配?
  • RQ5Taylor 框架与博弈论度量如 Harsanyi 股息之间的关系是什么?

主要发现

  • 所有十四种归因方法都可以重新表述为对 Taylor 独立效应和 Taylor 交互效应的分配。
  • 通用交互效应 J(S) 等价于 Harsanyi dividend H(S)。
  • 三条保真度原则,用于评估独立效应和交互效应对输入变量的公平分配。
  • 若干经典方法(例如 Shapley 值、Integrated Gradients、DeepLIFT Rescale)符合保真度原则。
  • 一张全面的映射表展示了每种方法如何通过指定对独立组件和交互组件的分配权重来契合统一范式。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。