Skip to main content
QUICK REVIEW

[论文解读] Explaining by Removing: A Unified Framework for Model Explanation

Ian Covert, Scott Lundberg|arXiv (Cornell University)|Nov 21, 2020
Explainable Artificial Intelligence (XAI)参考文献 95被引用 123
一句话总结

本文将移除式解释作为模型解释的统一框架,引入26种方法,通过三个设计选择进行统一,并将其与心理学、博弈论和信息理论联系起来。

ABSTRACT

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

研究动机与目标

  • 激发将多样化的模型解释方法联系起来、相互比较的需求。
  • 将移除式解释作为解释机器学习模型的一般框架。
  • 通过三个独立的设计选择(特征移除、模型行为、总结)来表征方法。
  • 利用心理学、博弈论和信息理论的洞见来统一和分析现有方法之间的联系。

提出的方法

  • 将移除式解释定义为量化从模型中移除特征组所产生影响的函数。
  • 通过三种选择来表征方法:如何移除特征、解释的模型行为,以及影响力如何被总结。
  • 对26种现有方法进行综述,并展示它们如何适应该三维框架。
  • 证明边缘化(条件或边缘)为解释提供信息论基础。
  • 将移除式解释与合作博弈理论联系起来,并将 Shapley 基于归因视为统一的主题进行讨论。
  • 通过在框架内组合现有方法来创建新方法进行经验性探索。

实验结果

研究问题

  • RQ1多样化的模型解释方法如何在单一的移除式框架下统一?
  • RQ2区分移除式解释的基本设计选择是什么?
  • RQ3在何种情况下移除式解释具备信息论解释?
  • RQ4现有方法如何通过认知心理学与合作博弈理论的洞见联系起来?
  • RQ5在框架中混合设计选项会产生哪些新的方法?

主要发现

  • 该框架统一了26种移除式解释方法,包括 SHAP、LIME、Meaningful Perturbations 和置换检验。
  • 基于边缘化的移除(条件或边缘)为移除式解释提供信息论基础。
  • 与合作博弈理论有深刻联系,Shapley 值常为特征影响的有原则的总结。
  • 该方法将解释与认知心理学中的减法性反事实推理、Mill 的差分法及相关理念联系起来。
  • 实验显示通过组合框架选项可以产生60 多种新的解释方法,并揭示方法之间的关系。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。