QUICK REVIEW

[論文レビュー] GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks

Kenza Amara, Rex Ying|arXiv (Cornell University)|Jun 20, 2022

Explainable Artificial Intelligence (XAI)被引用数 25

ひとこと要約

GraphFramExは、ユーザーのニーズに応じてGNNの説明性を評価する体系的なフレームワークを提案し、忠実度に基づく特徴付けと実データ・合成データの評価のためのトップ-kマスキングプロトコルを導入します。

ABSTRACT

As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on few inadequate synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more mission-critical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs". We propose a unique metric that combines the fidelity measures and classifies explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of input-level explainability for GNNs. For the inadequate but widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradient-based methods are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a trade-off. We further apply our evaluation protocol in a case study for frauds explanation on eBay transaction graphs to reflect the production environment.

研究の動機と目的

多様で限られたベンチマークのため、GNN説明性の共通評価プロトコルの必要性を喚起する。
説明可能マスクの種類（ハード vs. ソフト）を定義する。
必要な説明と十分な説明を区別する忠実度指標による真理値なしの評価を導入する。
必要な説明と十分な説明を共同で評価する特徴付けスコアを提案する。
本フレームワークを実データと合成データセットに適用し、eBayの不正取引ケーススタディを通じて実運用性を示す。

提案手法

ノード分類タスクに対して、モデル非依存/モデル認識の事後説明を評価する。
要素毎のマスクにより説明可能なサブグラフを生み出すエッジおよびノード特徴マスクを定義する（A_S = M_E ⊙ A, X_S = M_NF ⊙ X）。
2つの忠実度測定値（Fid+とFid−）に基づいて、説明を必要・十分・または両方に分類する。
忠実度を単一の特徴付けスコア（charact）として、Fid+と(1 − Fid−)の加重調和平均として結合する。
トップ-kマスキング戦略を採用して、説明サイズを固定し（kエッジ）、手法間の公正な比較を可能にする。
ユーザーの目標とモデル精度に基づいて説明者を選択するための意思決定木風ガイダンス（GraphFramEx）を提示する。

実験結果

リサーチクエスチョン

RQ1統一された真理値なしの評価フレームワークの下で、既存のGNN説明性手法はどのように比較されるか。
RQ2説明対象の焦点（現象対モデル）とマスクタイプ（ハード vs ソフト）が評価結果に与える影響は？
RQ3単一の特徴付けスコアは、データセット全体で必要な説明と十分な説明のバランスを意味のある形で取れるか。
RQ4現実的な運用環境（eBay不正グラフ）と合成ベンチマークを比較して、説明手法はどのように機能するか。
RQ5手法間での説明品質と計算時間のトレードオフは？

主な発見

すべての評価次元において単一の説明手法が支配しているわけではなく、トレードオフが存在する。
PageRankのような浅い手法は、最小計算時間を求めるタイプ-1の合成ベンチマークでは最も良く動作する一方、勾配ベースの手法はより複雑なグラフと意味のある特徴を持つデータで優れる。
Saliencyは実データセット上で全体的な特徴付けが最も強く、特に必要な説明に有効であるのに対し、Occlusion、Grad-CAM、 PageRankは十分な説明により適している。
ほとんどの手法は十分な説明を提供するが、強力な必要な説明を提供するものは少ない。Saliency、Distance、Occlusionはこの点で顕著な性能を示す。
GNNExplainerは本番のeBayグラフにおける不正ノードの説明に優れており、その設定では摂動ベースの手法が他を上回ることが多い。
合成ベンチマークと実データとの間で手法のランキングに不一致が観察され、タイプ-1合成データセットの限界を浮き彫りにしている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。