QUICK REVIEW

[论文解读] A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su‐In Lee|arXiv (Cornell University)|May 22, 2017

Explainable Artificial Intelligence (XAI)参考文献 8被引用 7,621

一句话总结

本文介绍 SHAP，一种统一的可加特征归因框架，独特地满足局部准确性、缺失性和一致性，统一了六种现有方法并实现模型无关和模型特定的近似。它提供理论保证和实际估计方法（Kernel SHAP、Deep SHAP），通过实验显示与人类直觉更一致。

ABSTRACT

Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

研究动机与目标

在复杂模型中，推动在保持高模型准确性的同时提供可解释的解释的必要性。
引入一个统一的可加特征归因框架，涵盖现有方法。
在该类中确立一个满足理想属性的唯一解，并将其与博弈论中的 Shapley 值联系起来。
开发实用的 SHAP 值估计方法，并证明相对于先前方法的改进。

提出的方法

将可加特征归因解释定义为对二进制简化输入的线性模型（phi_i 系数）。
表明六种现有方法在此定义下符合相同的解释模型形式（LIME、DeepLIFT、Layer-Wise Relevance Propagation、Shapley 基方法）。
利用协作博弈理论证明存在一个唯一解，满足局部准确性、缺失性和一致性（Shapley 值）。
将 SHAP 值定义为原始模型的条件期望函数的 Shapley 值。
提出模型无关的（Kernel SHAP、Shapley 采样）和模型特定的（Linear SHAP、Low-Order SHAP、Max SHAP、Deep SHAP）近似。
提供计算 SHAP 值的算法并讨论它们与现有方法的联系。

实验结果

研究问题

RQ1可否在一个统一的理论框架下将可加特征归因方法统一？
RQ2解释应满足哪些属性以确保可靠并与人类判断直觉相一致？
RQ3如何在模型无关和模型特定场景下高效估计 SHAP 值？
RQ4在如图像和文本/深度学习模型等任务中，基于 SHAP 的解释是否比先前方法更符合人类直觉？
RQ5如何在 SHAP 框架内改进或扩展现有方法？

主要发现

对于给定的输入映射，有一个唯一的可加解释模型满足局部准确性、缺失性和一致性。
SHAP 值统一了六种先前的方法，并为通过 Shapley 值进行特征归因提供了一个有原则的基础。
Kernel SHAP 提供了一个模型无关、基于回归的估计，相较于先前的基于 Shapley 值的方法具有更高的样本效率。
模型特定变体（Linear SHAP、Deep SHAP、Max SHAP）使得针对特定架构的归因速度更快或更准确。
对人类受试者的研究表明，在测试场景中，SHAP 解释比 LIME 或 DeepLIFT 更接近人类直觉。
在 MNIST 的实验表明，SHAP 及其衍生物提供的解释更能反映类别差异和输入重要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。