QUICK REVIEW

[论文解读] Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations

Tessa Han, Suraj Srinivas|arXiv (Cornell University)|Jun 2, 2022

Explainable Artificial Intelligence (XAI)被引用 28

一句话总结

该论文将八种流行的事后解释方法统一到局部函数近似框架之下，证明了解释的无自由午餐定理，并基于对黑箱模型的忠实性提出一个有原则的选择方法指南。

ABSTRACT

A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when. In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradients x Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods, demonstrating that no method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model. We empirically validate these theoretical results using various real-world datasets, model classes, and prediction tasks. By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.

研究动机与目标

激发在事后解释之间建立共同基础的需求。
形式化一个能够包含多种方法的局部函数近似框架。
建立一个解释的无自由午餐定理并推导出一个指导性选择原则。
在真实世界的数据集和模型上对理论主张进行实证验证。

提出的方法

将局部函数近似（LFA）定义为将事后解释统一为局部代理。
展示八种方法（LIME、C-LIME、KernelSHAP、Occlusion、Vanilla Gradients、Gradient x Input、SmoothGrad、Integrated Gradients）在不同邻域和损失下映射到LFA。
引入一个梯度匹配损失，将基于梯度的方法与LFA连接起来，并在特定噪声模型下证明与现有方法的等价。
证明一个解释的无自由午餐定理，表明没有一种方法在所有邻域内都是最优的。
提出一个模型恢复的指导原则，在黑箱模型属于可解释类别时基于忠实性来选择方法。
提供通过配置四个LFA组件（G、Z、l、⊕）来设计新解释的方法指南。

实验结果

研究问题

RQ1八种流行的解释方法是否共享一个共同的局部函数近似目标？
RQ2在什么条件下解释方法能够恢复黑箱模型，是否存在解释的无自由午餐？
RQ3研究者应如何基于对模型的忠实性以及所选择的邻域来在解释之间做出选择？
RQ4LFA 框架能否指导设计新的、与上下文相关的解释？

主要发现

所有八种方法在局部函数近似方面都在进行，只是在邻域和损失函数上存在差异。
存在解释的无自由午餐定理：没有一种方法在所有邻域中都是最优的。
提出了一个模型恢复的指导原则：当黑箱模型属于可解释类别时，若解释能够恢复该模型则为忠实。
经验结果表明，添加性连续噪声方法在连续域中与恢复真实模型一致，而乘性/噪声方法可能恢复梯度缩放形式。
该框架解释了何时方法与现有方法对齐，以及如何通过改变LFA组件来设计新的解释。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。