QUICK REVIEW

[論文レビュー] Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

Julius Adebayo, Michael Muelly|arXiv (Cornell University)|Dec 9, 2022

Explainable Artificial Intelligence (XAI)被引用数 25

ひとこと要約

本論文は、事後説明（特徴の寄与、概念活性化、訓練データ点のランキング）が未知の偽相関をモデルで検出可能かを評価し、非表示の偽信号には効果が乏しいことが多いと判定する。偽信号が可視で事前に知られている場合に限り、いくつかの手法が役立つ。

ABSTRACT

We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that feature attribution methods are susceptible to erroneously indicating dependence on spurious signals even when the model being explained does not rely on spurious artifacts. This finding casts doubt on the utility of these approaches, in the hands of a practitioner, for detecting a model's reliance on spurious signals.

研究の動機と目的

高リスク領域で使用されるDNNにおける偽信号を検出する必要性を喚起する。
既知の偽信号を注入し説明を評価する終端-to-終端の経験的方法論を開発する。
既知および未知の偽信号の下で三つの説明パラダイムの信頼性を定量化する。
実務家の実用性を評価する人間-in-ザ-ループ研究を実施する。
可視の偽信号を検出するために説明が信頼できる時と方法について指針を提供する。

提案手法

事前に指定された偽アーティファクトを用いた半合成データセットを作成し、モデルの偽信号への依存を誘導する。
Spurious Score（スペリウススコア）を定義し、モデルの偽信号への依存を定量化する。
三つの信頼性測度を開発する：Known Spurious Signal Detection Measure (K-SSD)、Cause-for-Concern Measure (CCM)、False Alarm Measure (FAM)。
三つの事後説明タイプ—特徴寄与（feature attribution）、概念活性化（TCAV）、訓練データ点ランキング（influence functions）—を、複数のタスクとアーキテクチャで評価する。
偽信号の検出可能性を実務的に評価するために、200人参加のブラインド・ユーザー研究を実施する。
Feature attributionにはSSIM、概念活性化と訓練データ点ランキングの比較にはKS検定を使用する。

実験結果

リサーチクエスチョン

RQ1事後説明は、テスト時に未知の偽訓練信号へのモデルの依存を検出できるか？
RQ2特徴寄与、概念活性化、訓練データ点ランキングは、可視な偽信号を検出するのに有効か？
RQ3偽信号の可視性（可視 vs 非可視）の異なる条件下で、これらの説明はどのように機能するか？
RQ4潜在的な偽信号について告知されていない場合、実務家は説明を用いて欠陥モデルを信頼性高く識別できるか？
RQ5偽相関を検出するために説明を効果的に使用するために実務家にどのような指針を提供できるか？

主な発見

事後説明は、信号が事前に分かっている場合に、可視な偽信号への依存を識別できる；背景ぼやけのような非可視信号には有効性が低下する。
特徴寄与法は高い偽警報リスクを示し、モデルが偽信号に依存していなくても偽信号に依存している可能性を示唆する。
概念活性化（TCAV）は、事前に知られている可視の偽信号を検出できるが、非可視信号には難しく、信号が未知の場合には信頼性が限定的。
訓練データ点ランキング（influence functions）は、偽信号が既知の場合にのみ、偽信号モデルのトップ偽入力の割合を増加させる。事前知識なしでは錯覚的な自信を招く可能性。
ブラインドのユーザー研究では、偽信号について事前知識がない場合、どの説明手法も欠陥モデルの検出に信頼性を提供しない；事前知識がある場合、TCAVは可視信号に対していくらか有効性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。