QUICK REVIEW

[論文レビュー] Consistent feature attribution for tree ensembles

Scott Lundberg, Su‐In Lee|arXiv (Cornell University)|Jun 19, 2017

Bayesian Modeling and Causal Inference参考文献 5被引用数 149

ひとこと要約

この論文は現在のツリーアンサンブルの特徴量寄与度手法が一貫性に欠けることを示し、Shapleyベースの寄与を計算する高速で正確な Tree SHAP アルゴリズムを導入して XGBoost に統合し、迅速で一貫した説明と改善された教師ありクラスタリングを可能にする。

ABSTRACT

Note that a newer expanded version of this paper is now available at: arXiv:1802.03888 It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.

研究の動機と目的

ツリーアンサンブルの既存の特徴量寄与手法は一貫性を欠き、直感に反することもあり得ることを示す。
SHAP値を唯一の一貫した寄与手法として動機づけ、採用する。
ツリーアンサンブルの SHAP 値を計算する高速で正確なアルゴリズムを開発する。
Tree SHAP を XGBoost に統合し、予測説明への影響を評価する。
監視付きクラスタリング実験を通じて SHAP 値の実践的利点を示す。

提案手法

木のアンサンブルの特徴量寄与を加法的特徴量寄与法と結びつけ、SHAPを唯一の一貫したアプローチとして正当化する。
指数時間から O(TLD^2) 時間へと複雑さを低減する、ツリーアンサンブルの正確な SHAP 値アルゴリズムを導出する。
実用的な用途のため、直感的な O(TL2^M) ベースラインとより速い O(TLD^2) 法を含む Tree SHAP アルゴリズムを開発する。
Tree SHAP を XGBoost に統合し、 large models の説明速度向上を実証する。

実験結果

リサーチクエスチョン

RQ1Are current feature attribution methods for tree ensembles inconsistent with respect to feature importance when model reliance changes?
RQ2Can SHAP values provide a unique, consistent, locally accurate attribution for tree ensembles?
RQ3How can SHAP values be computed efficiently for trees and tree ensembles?
RQ4What is the practical impact of SHAP-based attributions on model explanations and downstream tasks (e.g., clustering)?

主な発見

Current path-based feature attribution methods are inconsistent and can assign lower importance to a feature that has a larger impact on the output.
SHAP values are the only consistent, locally accurate additive feature attribution method that satisfies missingness and consistency when using conditional expectations.
Tree SHAP reduces SHAP computation from exponential to polynomial time, enabling explanations for large models (O(TL^2) for unbalanced trees, O(TL log^2 L) for balanced trees).
Integrating Tree SHAP into XGBoost enables fast, scalable explanations for models with thousands of trees and hundreds of inputs.
SHAP-based explanations improve supervised clustering performance compared to traditional path-based attributions in a gene-expression Alzheimer’s study.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。