QUICK REVIEW

[論文レビュー] Generating Counterfactual Explanations with Natural Language

Lisa Anne Hendricks, Ronghang Hu|arXiv (Cornell University)|Jun 26, 2018

Explainable Artificial Intelligence (XAI)被引用数 53

ひとこと要約

この論文は、 counterfactual textual explanations for image classifications を、対立クラスに対して識別的な証拠を特定し、それが画像に存在するか検証し、それを否定して fluent な counterfactual 文を生成する方法を提案します。 automatic metrics for phrase errors と counterfactual-text impact を Caltech-UCSD Birds で評価します。

ABSTRACT

Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. Current textual explanations learn to discuss class discriminative features in an image. However, it is also helpful to understand which attributes might change a classification decision if present in an image (e.g., "This is not a Scarlet Tanager because it does not have black wings.") We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image. To demonstrate our method we consider a fine-grained image classification task in which we take as input an image and a counterfactual class and output text which explains why the image does not belong to a counterfactual class. We then analyze our generated counterfactual explanations both qualitatively and quantitatively using proposed automatic metrics.

研究の動機と目的

画像が対立クラスに属さない理由を説明する explanations を促進・実現すること（クラスに属すると説明するだけでなく）
意味的で非画像的な証拠を活用して有益な counterfactual な statements を生成すること
counterfactual 証拠を予測し、それが画像に存在しないことを検証し、流暢な counterfactual テキストを生成する End-to-End のパイプラインを開発すること
提案手法の metrics で counterfactual explanations の品質と識別性を、細粒度データセット上で評価すること

提案手法

counter-class の explanations から noun-phrase extraction を用いて候補となる counterfactual 証拠を予測する
two つの evidence checker（Counterfactual: Classifier と Counterfactual: Phrase-Critic）を用いて counterfactual 証拠が画像に存在するか検証する
選択した counterfactual phrases を否定し、 counter-class と比較する cohesive な文を構成する（例: This is not a X because...）
rule-based な negation system を用いて最終的な counterfactual 文を作成し base explanation に付与する
必要に応じて retrieval-grounding model による phrase-grounding を行い、 phrase-critic のスコアリングを inform するための localization を行う
Caltech-UCSD Birds データセット上で phrase error と counterfactual text の accuracy で評価する

実験結果

リサーチクエスチョン

RQ1 counterfactual explanations は、画像に欠けている属性を示してクラス決定を変えることで interpretability を向上させることができるか？
RQ2 モデルは画像に存在しない counterfactual evidence をどの程度正確に予測・検証できるか？
RQ3 counterfactual の追加は、 explanations から正しいクラスを予測する classifier の能力を減少させ、識別性を示すか？
RQ4 どの evidence checker（Classifier vs Phrase-Critic）が、堅牢な counterfactual テキスト生成をよりよくサポートするか？

主な発見

Model	Phrase Error	Accuracy w/CF Text
Baseline	16.26	39.54
CF: Classifier	8.99	38.16
CF: Phrase-Critic	7.37	36.62

両方の counterfactual モデル（CF: Classifier と CF: Phrase-Critic）は、生成された explanations の phrase errors を減らす点でベースラインを上回る。
counterfactual テキストが追加されると、文レベルの accuracy がすべてのモデルで低下し、テキストがクラス識別的判断に影響を与えることを示す。
Phrase-Critic モデルは、通常、ベースラインと分類器よりも語句 grounding のパフォーマンスが良く、語句エラーが低くなる傾向があり、 counterfactual 属性のローカリゼーションが改善されていることを示唆する。
grounding に基づくアプローチは、外部データ（例: Visual Genome）や語句レベルの localization を活用することで、 counterfactual 証拠の選択をより効果的にする。
ベースラインは語句エラーでは依然として強力だが、提案された counterfactual アプローチは、誤った counterfactual 記述を減らす点で上回る。
質的な例は、「This is not a Bobolink because it does not have a yellow nape」のような counterfactual explanations を示し、似た鳥の間の相違を明確にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。