QUICK REVIEW

[論文レビュー] Assessing the Local Interpretability of Machine Learning Models

Dylan Slack, Sorelle A. Friedler|arXiv (Cornell University)|Feb 9, 2019

Explainable Artificial Intelligence (XAI)参考文献 22被引用数 48

ひとこと要約

本論文は、2つの局所的解釈可能性の定義—simulatabilityと what-if local explainability—を、決定木、ロジスティック回帰、ニューラルネットワークを横断して、大規模なクラウドソーシング調査とランタイム・オペレーション数代理指標を用いて実証的に評価する。

ABSTRACT

The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input) and "what if" local explainability (a user's ability to correctly determine a model's prediction under local changes to the input, given knowledge of the model's original prediction). Through a user study with 1,000 participants, we test whether humans perform well on tasks that mimic the definitions of simulatability and "what if" local explainability on models that are typically considered locally interpretable. To track the relative interpretability of models, we employ a simple metric, the runtime operation count on the simulatability task. We find evidence that as the number of operations increases, participant accuracy on the local interpretability tasks decreases. In addition, this evidence is consistent with the common intuition that decision trees and logistic regression models are interpretable and are more interpretable than neural networks.

研究の動機と目的

MLモデルに対する局所的解釈可能性の概念（simulatabilityとwhat-if local explainability）の定義と検証。
決定木、ロジスティック回帰、ニューラルネットワークといった一般的なモデルが局所的解釈可能性において異なるかを評価する。
単純なランタイムオペレーション数が局所的解釈能力を利用者が有することの代理指標となるかを評価する。
人間の解釈性を考慮したモデル選択のための実証的ベンチマークと指針を提供する。

提案手法

2つの局所的解釈可能性タスクを定義する（入力に対してモデルをシミュレートする；小さな入力変化の下で出力を決定する）。
合成データセットを用いて3つのモデルタイプ（決定木、ロジスティック回帰、ニューラルネットワーク）を訓練・表現する。
予測時のPythonでのランタイム操作（算術・論理）を計測・カウントして、解釈可能性の代理指標を導出する。
モデルと入力に対して、simulatabilityとwhat-if explainabilityを検証する1000名規模のクラウドソーシング調査を実施する。
オペレーション数の関数としての時間と精度を分析し、Fisherの正確検定とBonferroni補正済みp値を用いてモデルを比較する。

実験結果

リサーチクエスチョン

RQ1異なるモデルタイプについて、人間はsimulatabilityとwhat-if local explainabilityタスクでより良い性能を発揮するか？
RQ2決定木、ロジスティック回帰、ニューラルネットワークの間で局所的解釈可能性の相対的な序列があるか？
RQ3総オペレーション数はタスク完了までの時間および解釈可能性タスクの精度と相関するか？
RQ4オペレーション数はモデル間の局所的解釈可能性の代理指標として機能するか？

主な発見

モデル	シミュレート可能性正解	What If 正解	p値 (Simulatability)	p値 (What If)	95% CI (Simulatability)	95% CI (What If)
Decision Tree	717 / 930	719 / 930	5.9×10^{-63}	5.16×10^{-64}	[0.73,0.81]	[0.73,0.82]
Logistic Regression	592 / 930	579 / 930	1.94×10^{-15}	2.07×10^{-12}	[0.59,0.69]	[0.57,0.67]
Neural Network	556 / 930	499 / 930	7.34×10^{-8}	0.78	[0.55,0.65]	[0.49,0.59]

この表現に基づく局所的解釈可能性において、決定木とロジスティック回帰は局所的に解釈可能である。一方、ニューラルネットワークはそうではない。
決定木は、simulatabilityおよびwhat-ifタスクの両方で、ロジスティック回帰およびニューラルネットワークより局所的に解釈可能である。
オペレーション数が増加すると、タスクの所要時間が長くなり、解釈可能性タスクの精度は低下する。特に決定木で顕著。
ニューラルネットワークはシミュレーション時間が著しく長く、サイズが大きいとシミュレーション不能になる可能性がある。大きなオペレーション数は局所的解釈可能性を制限する。
Fisher検定を介して、DT > LR > NN が相対的な局所解釈可能性であるという強い証拠を提供する。
提案された代理指標（ランタイムオペレーション数）は、タスクとモデル全体で時間と精度の傾向を追跡する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。