[论文解读] "Why Should I Trust You?": Explaining the Predictions of Any Classifier
本文介绍了 LIME,一种对任意单个预测进行解释的模型不可知方法,使用忠实的、局部代理的可解释模型;以及 SP-LIME,用于选择具有代表性的解释以评估模型的全局性;它展示了在文本和图像分类器上,解释的保真性和提升信任的效益,且有基于人类和模拟实验的验证。
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.
研究动机与目标
- 鼓励解释的必要性,以在现实世界部署的预测和模型中建立信任。
- 提出 LIME,通过学习一个围绕预测的局部保真、可解释模型来解释任何分类器。
- 引入 SP-LIME,选择多样且具有代表性的解释集,以建立对全局模型的信任。
- 通过仿真和人类研究在信任相关任务中展示解释的实用性。
提出的方法
- 将输入定义为可解释的表示(文本为单词出现与否,图像为超像素)。
- 将解释表述为简单、可解释族G中的模型 g,使其在实例 x 附近对黑箱 f 进行局部近似。
- 最小化一个局部加权损失 L(f,g,πx) 以及复杂度惩罚 Ω(g),以得到解释 ξ(x)。
- 使用围绕 x′ 的扰动和一个邻近核 πx,将局部代理拟合到 f 的输出。
- 针对文本和图像,具体化为稀疏线性解释(g(z′)=w·z′),使用 L2 损失和基于 L1 的稀疏性步骤(K-LASSO)。
- 给出一个实用算法(Algorithm 1),并讨论复杂性与可解释性之间的权衡。
实验结果
研究问题
- RQ1Can explanations faithfully reflect a model’s behavior for individual predictions?
- RQ2Do explanations help users trust predictions and choose between models?
- RQ3Can a model-wide understanding be built from a small, non-redundant set of explanations?
- RQ4Is a model-agnostic explainer capable of explaining diverse models (texts, images, neural nets)?
主要发现
- LIME explanations achieve high fidelity to the underlying model in the local neighborhood (e.g., recall >90% for truly important features on two interpretable classifiers).
- Explanations enable better trust in individual predictions and improve decision-making about model use and forbeding untrustworthy models.
- SP-LIME (submodular pick) selects a diverse, representative set of explanations that improves tasks like model comparison and trust-based selections.
- Qualitative examples show intuitive, human-understandable attributions (e.g., words or super-pixels contributing to a class).
- Simulated and human experiments demonstrate that explanations support tasks such as predicting which classifier generalizes better and guiding feature engineering.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。