QUICK REVIEW

[論文レビュー] Explanations can be manipulated and geometry is to blame

Ann-Kathrin Dombrowski, Maximilian Alber|arXiv (Cornell University)|Jun 19, 2019

Explainable Artificial Intelligence (XAI)参考文献 27被引用数 145

ひとこと要約

本論文は、一般的な帰属（アトリビューション）手法によって生成される説明が操作可能であると主張し、その脆弱性をモデルと入力の幾何学的性質に帰属する。

ABSTRACT

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

研究の動機と目的

説明方法の操作への感受性を動機づけ、分析する。
モデルおよび入力空間の幾何学的性質が説明の脆弱性にどのように寄与するかを検討する。
いくつかの標準的な説明方法を調査し、操作の文脈におけるそれらの限界について議論する。

提案手法

Gradient、Gradient × Input、Integrated Gradientsなど、勾配ベースのアトリビューション手法を説明する。
Guided BackpropagationやLayer-wise Relevance Propagationを含むバックプロパゲーションベースの説明を検討する。
入力空間の幾何学とモデルの意思決定境界が説明の挙動に影響を与える様子を強調する。

実験結果

リサーチクエスチョン

RQ1一般的な説明方法は、敵対的入力によって操作されたり欺瞞されたりすることができるか？
RQ2モデルと入力空間の幾何学は説明の信頼性にどのような役割を果たすか？
RQ3一般的なアトリビューション技術には操作を可能にする固有の脆弱性があるか？
RQ4さまざまなアトリビューション手法は、操作に対する感受性の点でどう比較されるか？

主な発見

アトリビューション手法によって生成された説明は操作に敏感になりうる。
幾何学は、手法を超えて説明の脆弱性に中心的な役割を果たす。
いくつかの標準的なアトリビューション手法（例：Gradient、Gradient × Input、Integrated Gradients、GBP、LRP）を、それらの弱点の文脈で議論する。
本論文は、ピクセルの摂動が得られる説明にどのように影響するかを分析する。
本研究は、バックプロパゲーションとリレバンス・プロパゲーションの数学的特性を説明可能性の弱点に結びつける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。