QUICK REVIEW

[論文レビュー] Explaining by Removing: A Unified Framework for Model Explanation

Ian Covert, Scott Lundberg|arXiv (Cornell University)|Nov 21, 2020

Explainable Artificial Intelligence (XAI)参考文献 95被引用数 123

ひとこと要約

この論文は除去ベースの説明をモデル解釈の統一フレームワークとして導入し、3つの設計選択により26の方法を統一し、それらを心理学、ゲーム理論、情報理論へと結びつける。

ABSTRACT

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

研究の動機と目的

diverseなモデル説明方法を関連付けて比較する必要性を動機づける。
MLモデルを解釈する一般的なフレームワークとして除去ベースの説明を導入する。
3つの独立した設計選択（特徴の除去、モデル挙動、要約）で方法を特徴づける。
心理学、ゲーム理論、情報理論の洞察を用いて既存の方法間の結びつきを統合・分析する。

提案手法

特徴群をモデルから除去する影響を定量化する関数として除去ベースの説明を定義する。
特徴がどのように除去され、どのモデル挙動が説明されるか、影響がどのように要約されるかという3つの選択肢で方法を特徴づける。
26件の既存手法を調査し、それらが三次元フレームワークにどのように適合するかを示す。
限界化（条件付きまたは限界）を用いた場合、説明の情報理論的解釈が得られることを実証する。
除去ベースの説明を協力ゲーム理論と関連付け、Shapleyに基づく寄与度を統一的なテーマとして議論する。
既存の方法をフレームワーク内で組み合わせて新しいアプローチを生み出すことで、経験的探索を提供する。

実験結果

リサーチクエスチョン

RQ1多様なモデル説明手法をどのように単一の除去ベースのフレームワークに統合できるか？
RQ2除去ベースの説明を区別する根本的な設計選択は何か？
RQ3除去ベースの説明はいつ情報理論的解釈を許容するか？
RQ4既存の手法は認知心理学と協力ゲーム理論の洞察を通じてどのように関連しているか？
RQ5フレームワーク内の選択を組み合わせることで新たに生まれる手法は何か？

主な発見

このフレームワークはSHAP、LIME、Meaningful Perturbations、置換検定を含む26の除去ベースの説明手法を統合する。
除去を伴う限界化（条件付きまたは限界化）は除去ベースの説明の情報理論的基盤を提供する。
協力ゲーム理論との深いつながりがあり、Shapley値は特徴の影響度の原理的な要約を提供することが多い。
アプローチは認知心理学の概念（差し引き的反実仮想推論、Millの差の方法、関連するアイデア）につながる。
実験はフレームワークの選択を組み合わせることで60件以上の新しい説明手法を生み出し、手法間の関係を明らかにすることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。