QUICK REVIEW

[论文解读] Explainable AI for Trees: From Local Explanations to Global Understanding

Scott Lundberg, Gabriel Erion|arXiv (Cornell University)|May 11, 2019

Explainable Artificial Intelligence (XAI)参考文献 68被引用 262

一句话总结

本论文介绍 TreeExplainer，一种用于树模型的基于Shapley的局部解释的精确多项式时间算法，扩展它们以捕捉特征交互，并展示如何通过大量局部解释构建全局理解。它还展示在医疗数据集中对死亡率、肾病和医院过程时长的应用，以及模型监控和子组发现。

ABSTRACT

Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

研究动机与目标

通过提供带有博弈论保证的精确局部解释来提高基于树的模型（随机森林、梯度提升树）的可解释性。
将局部解释扩展为直接衡量特征交互。
开发工具，通过聚合大量局部解释来推断全局模型结构，并展示实际的医疗应用。

提出的方法

Develop TreeExplainer to compute SHAP values exactly in polynomial time for tree ensembles.
Introduce SHAP interaction values to capture local feature interactions.
Propose five methods to combine local explanations into global model understanding while retaining local faithfulness.
Evaluate against 21 local explanation metrics across three datasets and models.
Provide high-performance implementations integrated with major tree-based ML packages.

实验结果

研究问题

RQ1Can exact Shapley-based local explanations be computed efficiently for tree models?
RQ2How can local explanations be extended to quantify feature interactions at a local level?
RQ3How can many local explanations be aggregated to reveal global model structure and behavior?
RQ4Do SHAP-based explanations align with human intuition in medical decision contexts?
RQ5How can local explanations be used to monitor deployed models and detect data drift or issues over time?

主要发现

TreeExplainer computes SHAP values exactly in polynomial time, with local accuracy and consistency guarantees.
SHAP interaction values enable decomposition of effects into main and interaction components at a local level.
Aggregating many local explanations yields richer, more faithful global model representations than traditional global feature importance.
TreeExplainer outperforms other local explanation methods across 21 evaluation metrics on CKD, mortality, and hospital duration datasets.
Local explanation embeddings support supervised clustering and interpretable dimensionality reduction for population subgroups.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。