QUICK REVIEW

[论文解读] Consistent feature attribution for tree ensembles

Scott Lundberg, Su‐In Lee|arXiv (Cornell University)|Jun 19, 2017

Bayesian Modeling and Causal Inference参考文献 5被引用 149

一句话总结

本文表明当前的树集成特征归因方法存在不一致性，并引入快速精确的 Tree SHAP 算法来计算基于 Shapley 的归因，将其集成到 XGBoost，以实现快速、一致的解释并改进有监督聚类。

ABSTRACT

Note that a newer expanded version of this paper is now available at: arXiv:1802.03888 It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.

研究动机与目标

证明现有用于树集成的特征归因方法可能不一致且直观性不足。
倡导并采用 SHAP 值作为唯一的一致归因方法。
开发用于树集成的快速、精确算法（Tree SHAP）来计算 SHAP 值。
将 Tree SHAP 集成到 XGBoost 并评估对预测解释的影响。
通过有监督聚类实验说明 SHAP 值的实际好处。

提出的方法

将树集成特征归因与加法特征归因方法联系起来，以证明 SHAP 作为唯一的一致方法的合理性。
推导树集成的精确 SHAP 值算法，将复杂度从指数级降低到 O(TLD^2) 时间。
开发 Tree SHAP 算法，包括一个直接的 O(TL2^M) 基线，以及一个更快的 O(TLD^2) 实用方法。
将 Tree SHAP 集成到 XGBoost，并展示对大模型的解释速度提升。

实验结果

研究问题

RQ1当模型对各特征的依赖性改变时，当前用于树集成的特征归因方法在特征重要性方面是否不一致？
RQ2SHAP 值是否能够为树集成提供唯一、一致且局部准确的归因？
RQ3如何高效地计算树及树集成的 SHAP 值？
RQ4基于 SHAP 的归因对模型解释及下游任务（如聚类）的实际影响有哪些？

主要发现

当前的基于路径的特征归因方法存在不一致性，可能将对输出影响更大的特征评估为较低重要性。
SHAP 值是唯一在使用条件期望时满足缺失性和一致性的、具有局部准确性的加法性特征归因方法。
Tree SHAP 将 SHAP 计算从指数时间降至多项式时间，使对大模型的解释成为可能（非平衡树为 O(TL^2)，平衡树为 O(TL log^2 L)）。
将 Tree SHAP 集成到 XGBoost 可实现对拥有数千棵树和数百个输入的模型的快速、可扩展解释。
在一项基因表达相关的阿尔茨海默病研究中，基于 SHAP 的解释比传统路径归因在有监督聚类中表现更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。