QUICK REVIEW

[论文解读] On Completeness-aware Concept-Based Explanations in Deep Neural Networks

Chih‐Kuan Yeh, Been Kim|arXiv (Cornell University)|Oct 17, 2019

Explainable Artificial Intelligence (XAI)参考文献 37被引用 50

一句话总结

这篇论文为基于概念的解释在深度神经网络中定义了完整性分数，提出一个带有可解释性正则化的完整性感知概念发现方法，并提出 ConceptSHAP 来量化概念归因；并在合成数据、图像 (AwA) 与文本 (IMDB) 数据上验证该方法。

ABSTRACT

Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior based on the assumption that complete concept scores are sufficient statistics of the model prediction. Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations. To define an importance score for each discovered concept, we adapt game-theoretic notions to aggregate over sets and propose ConceptSHAP. Via proposed metrics and user studies, on a synthetic dataset with apriori-known concept explanations, as well as on real-world image and language datasets, we validate the effectiveness of our method in finding concepts that are both complete in explaining the decisions and interpretable. (The code is released at https://github.com/chihkuanyeh/concept_exp)

研究动机与目标

定义一个正式的“完整性”分数，用于 DNNs 的基于概念的解释。
通过无监督发现开发一个完整且可解释的概念集合。
提出 ConceptSHAP，以在完整性的框架下量化概念归因。
通过正则化发现过程，确保概念的一致性和语义意义。
展示在合成数据以及真实世界的图像和语言数据集上的有效性。

提出的方法

将输入 x 表示为补丁 x_t，在潜在空间中投影到概念向量 c_1,...,c_m。
通过与 c_j 的阈值化内积定义概念乘积 v_c(x_t)，并归一化形成 v_c(x)。
假设完整的概念为预测提供充分统计量；学习一个从 v_c(x) 到激活空间的映射 g，并评估预测性。
提出一个正则项 R(c)，促进概念邻域的局部性/一致性以及概念间的多样性，以提升可解释性。
使用 SGD 最大化联合目标 log P[h_y(g(v_c(x)))] + R(c)，以发现概念和映射 g。
将 ConceptSHAP 定义为基于 Shapley 值的对每个概念对完整性分数的归因，包含多分类设置中的按类别变体。

实验结果

研究问题

RQ1我们如何量化一组概念在解释 DNN 决策时的充分性（完整性）？
RQ2我们能否自动发现一组完整且可解释的概念集，能够共同解释模型预测？
RQ3如何以一个有原则的方式对每个概念对整体完整性分数（以及每个类别）的重要性进行归因？

主要发现

提出的完整性分数 eta_f(c_1,...,c_m) 测量概念分数在多大程度上相对于完整模型恢复预测。
该完整性感知发现方法在合成数据集上检索正确概念并实现更高自动对齐方面，优于基线（ACE、ACE-SP、PCA、k-means）。
在 Animals with Attributes (AwA) 与合成数据上，该方法在完整性方面达到最高分。
ConceptSHAP 为每个概念对完整性分数给出归因，符合 Shapley 公理（效率、对称、虚拟性、可加性）。
一个按类别的 ConceptSHAP 变体识别对特定类别贡献最大的概念，帮助类别特定的可解释性。
人类与自动评估均显示，所发现的概念在图像和语言任务（AwA 与 IMDB）中具有连贯性、可解释性和语义意义。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。