QUICK REVIEW

[论文解读] Hierarchical interpretations for neural network predictions

Chandan Singh, William J. Murdoch|arXiv (Cornell University)|Jun 14, 2018

Explainable Artificial Intelligence (XAI)参考文献 44被引用 69

一句话总结

ACD 引入聚合上下文分解，提供对 DNN 预测的分层、基于群组的解释，能够在 NLP 和计算机视觉任务中实现短语/补丁级的洞察以及鲁棒的可视化。

ABSTRACT

Deep neural networks (DNNs) have achieved impressive predictive performance due to their ability to learn complex, non-linear relationships between variables. However, the inability to effectively visualize these relationships has led to DNNs being characterized as black boxes and consequently limited their applications. To ameliorate this problem, we introduce the use of hierarchical interpretations to explain DNN predictions through our proposed method, agglomerative contextual decomposition (ACD). Given a prediction from a trained DNN, ACD produces a hierarchical clustering of the input features, along with the contribution of each cluster to the final prediction. This hierarchy is optimized to identify clusters of features that the DNN learned are predictive. Using examples from Stanford Sentiment Treebank and ImageNet, we show that ACD is effective at diagnosing incorrect predictions and identifying dataset bias. Through human experiments, we demonstrate that ACD enables users both to identify the more accurate of two DNNs and to better trust a DNN's outputs. We also find that ACD's hierarchy is largely robust to adversarial perturbations, implying that it captures fundamental aspects of the input and ignores spurious noise.

研究动机与目标

阐明超越单一特征重要性的可解释 DNN 预测解释的必要性。
开发一种通用方法，用于提取任意 DNN 架构中特征组之间的相互作用。
创建一个分层可视化框架，以在多种粒度上展示预测交互。
展示 ACD 在诊断错误预测、检测数据集偏差以及评估信任度/对抗鲁棒性方面的实用性。

提出的方法

通过跨层将 logits g(x) 分解为 beta(x) 与 gamma(x)，将 Contextual Decomposition (CD) 泛化到任意 DNN（方程式 1–6）。
在卷积层中划分偏置，并对 ReLU 和最大池化的分解规则进行适配，以产生逐层的 CD 组成（方程 5–11）。
定义以 CD 分数作为连接度量的凝聚聚类，以构建分层解释（算法 1）。
迭代地添加得分最高的群组（在前 k% 的范围内），并通过将当前群组扩展到相邻特征（文本）或补丁（图像）来生成候选群组。
根据应用特定标准终止层级结构（例如，对情感分析在所有单词被选中时停止；对于图像在预定义迭代次数后停止）。
该方法在除了需要一个用于驱动聚类的群组级重要性分数函数（CD）之外，保持模型无关性。

实验结果

研究问题

RQ1分层的基于群组的解释是否能揭示 DNN 学习到的非线性特征相互作用？
RQ2聚合上下文分解（ACD）是否在 NLP 和视觉模型中产生直观、可信的解释？
RQ3与非分层解释相比，ACD 层次结构对对抗扰动是否具有鲁棒性？
RQ4ACD 是否有助于在真实数据集如 SST、MNIST、ImageNet 中诊断错误预测和数据集偏差？

主要发现

Length	Positive phrases	Negative phrases
1	pleasurable, sexy, glorious	nowhere, grotesque, sleep
3	amazing accomplishment., great fun.	bleak and desperate, conspicuously lacks.
5	a pretty amazing accomplishment.	ultimately a pointless endeavour.
8	presents it with an unforgettable visual panache.	my reaction in a word: disappointment.

ACD 产生直观的可视化，揭示对预测做出贡献的有意义短语和图像补丁。
人类研究显示，ACD 能帮助用户识别更准确的模型，并将 ACD 评为比先前方法更可信。
ACD 层次结构对对抗扰动具有鲁棒性，表明它们捕捉的是基本输入特征而非噪声。
定性示例诊断 SST 的错误预测，并在 ImageNet 中识别数据集偏差（例如，滑板特征有助于曲棍球 puck 分类）。
Table 1 展示 ACD 在 SST 上发现的不同长度的最高得分短语（正面短语与负面短语的示例）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。