QUICK REVIEW

[论文解读] Consistent Individualized Feature Attribution for Tree Ensembles

Scott Lundberg, Gabriel Erion|arXiv (Cornell University)|Feb 12, 2018

Forest ecology and management参考文献 22被引用 552

一句话总结

本文识别了常见树模型特征归因中的不一致性，并引入具有快速 Tree SHAP 算法的 SHAP 值与 SHAP 交互值，从而实现对树集成的精确、一致、个性化解释。

ABSTRACT

Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases. This is a fundamental problem that casts doubt on any comparison between features. To address it we turn to recent applications of game theory and develop fast exact tree solutions for SHAP (SHapley Additive exPlanation) values, which are the unique consistent and locally accurate attribution values. We then extend SHAP values to interaction effects and define SHAP interaction values. We propose a rich visualization of individualized feature attributions that improves over classic attribution summaries and partial dependence plots, and a unique "supervised" clustering (clustering based on feature attributions). We demonstrate better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features. An implementation of our algorithm has also been merged into XGBoost and LightGBM, see http://github.com/slundberg/shap for details.

研究动机与目标

激发并正式界定对树集成实现一致、个性化特征归因的需求。
提出 SHAP 值作为加性特征归因中唯一一致且局部准确的归因方法。
开发快速、精确的 Tree SHAP 算法，以计算大型树集成的 SHAP 值。
将 SHAP 扩展为 SHAP 交互值，以捕捉成对特征交互。
通过可视化、聚类和实际数据应用，展示实际收益。

提出的方法

定义 f_x(S)=E[f(x) | x_S]，并推导出 SHAP 值，作为唯一的一致、局部准确的归因。
开发 Tree SHAP，在 O(TLD^2) 时间内计算 SHAP 值，作为朴素的 O(TL2^M) 方法的多项式时间替代。
将 SHAP 扩展为 SHAP 交互值，使用 Shapley 交互指标量化成对特征交互。
引入可视化工具：SHAP 依赖性图和 SHAP summary 图，以及基于 SHAP 归因的有监督聚类。
提供与 XGBoost 和 LightGBM 集成的实现，便于实际使用。

实验结果

研究问题

RQ1SHAP 值是否能提供对树集成唯一的一致且局部准确的个体化特征归因？
RQ2如何高效地为大型树集成计算 SHAP 值？
RQ3SHAP 交互值在揭示树模型中特征交互中的作用是什么？
RQ4基于 SHAP 的可视化和有监督聚类是否相较于现有方法提升了解读性和可操作性洞察？

主要发现

SHAP 值是对树集成唯一的一致且局部准确的个体化归因，在缺失和条件依赖下成立。
Tree SHAP 以 O(TLD^2) 时间计算精确的 SHAP 值，使大型模型的解释具可扩展性。
SHAP 交互值提供一个有原则性、对称的特征交互度量。
基于 SHAP 的可视化（summary 图和依赖性图）及有监督聚类，提升与人类直觉的一致性和聚类性能。
实证演示显示相比先前方法具有更快的运行时间、对影响力特征的更好识别，以及更清晰的交互洞察。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。