QUICK REVIEW

[論文レビュー] shapiq: Shapley Interactions for Machine Learning

Maximilian Muschalik, Hubert Baniecki|arXiv (Cornell University)|Oct 2, 2024

Software Engineering Research被引用数 5

ひとこと要約

shapiq を紹介する。Shapley Values (SVs) および任意次数の Shapley Interactions (SIs) を計算するメソッドを統合し、ベンチマークするオープンソースの Python ライブラリで、アプリケーション非依存のインターフェースと複数ドメインに跨るベンチマークスイートを備える。

ABSTRACT

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research. The source code and documentation are available at https://github.com/mmschlk/shapiq.

研究の動機と目的

ML において SVs および任意次数の SIs を計算するアプリケーション非依存のフレームワークを提供する。
最先端の SI 近似アルゴリズムを単一のインターフェースの下で統合する。
説明 API を提供し、モデル予測における相互作用を説明・可視化する。
ドメイン横断の事前計算済みの ground-truth SI 値を含むベンチマークスイートを提供し、研究者を支援する。
標準的な特徴寄与を超える特徴相互作用の研究と可視化を促進する。

提案手法

複数の相互作用指標と次数にまたがる SI アルゴリズム向けの近似インターフェースを実装する。
18 の相互作用指標と MI 表現の正確計算のための ExactComputer を含む。
border- および pairing-tricks のようなサンプリング技法を用いた近似の改善を行う CoalitionSampler インターフェースを提供する。
評価のための pre-computed SI ground-truths を含む、実世界の ML ドメインにまたがる 11 のベンチマークゲームのスイートを提供する。
予測の任意次数の特徴相互作用を生成・可視化する Explainer API を公開する。
TreeSHAP-IQ と統合し、木ベースモデルの効率的な説明と一般的な ML ライブラリのサポートを提供する。

実験結果

リサーチクエスチョン

RQ1ML モデルにおける任意次数の Shapley Interactions を効率的に計算するにはどうすればよいか？
RQ2統一された API が多様なモデルタイプとデータドメインにわたって複数の SI 指標と近似手法をサポートできるか？
RQ3実用的なリソース制約の下で SI 近似手法は ground-truth SI 値をどの程度近似できるか？
RQ4研究者は実世界のデータセットとモデルを横断して高次の相互作用をどのようにベンチマークし可視化できるか？

主な発見

shapiq は、最先端の SI アルゴリズムと小さなゲームの正確計算を統合するオープンソースライブラリを提供する。
18 個の相互作用指標とゲーム理論の概念を含み、最高次数表現として Möbius Interactions を含む。
このパッケージには、複数の実世界MLドメインにわたる11のベンチマークゲームと、2,042 の構成にわたる事前算出済みの SI ground-truth を含むベンチマークスイートが含まれる。
vision transformers、language models、XGBoost、LightGBM（TreeSHAP-IQ）などのモデルに対する任意次数の特徴相互作用の説明と可視化を可能にする。
shapiq は Shapley ベースの説明を特徴寄与を超えて拡張し、ML ゲーム理論および SI アプリケーションの研究を支援する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。