Skip to main content
QUICK REVIEW

[Paper Review] Explaining by Removing: A Unified Framework for Model Explanation

Ian Covert, Scott Lundberg|arXiv (Cornell University)|Nov 21, 2020
Explainable Artificial Intelligence (XAI)95 references123 citations
TL;DR

This paper introduces removal-based explanations as a unified framework for model interpretation, unifying 26 methods by three design choices and linking them to psychology, game theory, and information theory.

ABSTRACT

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

Motivation & Objective

  • Motivate the need to relate and compare diverse model explanation methods.
  • Introduce removal-based explanations as a general framework for interpreting ML models.
  • Characterize methods by three independent design choices (feature removal, model behavior, summary).
  • Unify and analyze connections among existing methods using insights from psychology, game theory, and information theory.

Proposed method

  • Define removal-based explanations as functions that quantify the impact of removing feature groups from a model.
  • Characterize methods by three choices: how features are removed, what model behavior is explained, and how influence is summarized.
  • Survey 26 existing methods and show how they fit into the three-dimensional framework.
  • Demonstrate that marginalization (conditional or marginal) yields information-theoretic interpretations of explanations.
  • Relate removal-based explanations to cooperative game theory and discuss Shapley-based attribution as a unifying theme.
  • Provide empirical exploration by combining existing methods within the framework to create new approaches.

Experimental results

Research questions

  • RQ1How can diverse model explanation methods be unified under a single removal-based framework?
  • RQ2What are the fundamental design choices that differentiate removal-based explanations?
  • RQ3When do removal-based explanations admit information-theoretic interpretations?
  • RQ4How are existing methods related through cognitive psychology and cooperative game theory insights?
  • RQ5What new methods emerge by mixing choices within the framework?

Key findings

  • The framework unifies 26 removal-based explanation methods, including SHAP, LIME, Meaningful Perturbations, and permutation tests.
  • Marginalization-based removals (conditional or marginal) yield an information-theoretic basis for removal-based explanations.
  • There are deep connections to cooperative game theory, with Shapley value often providing a principled summary of feature influence.
  • The approach links explanations to cognitive psychology concepts like subtractive counterfactual reasoning, Mill’s method of difference, and related ideas.
  • Experiments show how combining framework choices yields 60+ new explanation approaches and reveals relationships among methods.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.