QUICK REVIEW

[论文解读] Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Been Kim, Martin Wattenberg|arXiv (Cornell University)|Nov 30, 2017

Explainable Artificial Intelligence (XAI)参考文献 32被引用 476

一句话总结

TCAV 引入 Concept Activation Vectors，以量化用户定义的高层概念如何影响模型预测，从而在不重新训练的情况下实现全局、基于概念的解释。它将方向导数与统计检验结合起来，以评估跨类别的概念敏感性。

ABSTRACT

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of "zebra" is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

研究动机与目标

用高层概念来提供对神经网络的人性化解释。
允许通过用户提供的示例自定义概念，超越训练数据标签。
提供一个无需重新训练或修改模型即可使用的即插即用解释方法。
定量评估概念对跨类别模型预测的全局重要性。

提出的方法

将概念定义为由用户提供的一组示例输入。
训练一个线性分类器，将概念示例的层激活与随机负样本分离，以获得 Concept Activation Vector (CAV)。
通过将激活沿 CAV 方向投影，计算方向导数（概念敏感性），以衡量对类别 logits 的影响。
将 TCAV 分数定义为对类别 k 的输入中方向导数为正的比例，从而得到一个全局解释性度量。
通过使用不同的随机负样本重复训练 CAV 并进行双尾 t 检验，结合 Bonferroni 校正来验证概念的统计显著性。
将 TCAV 扩展为 Relative TCAV，以在学习的子空间中比较相关概念。

实验结果

研究问题

RQ1如何在神经网络的内部激活空间中表示高层次、易于人类解释的概念？
RQ2我们是否可以在不重新训练模型的情况下，量化用户定义概念对模型预测的影响？
RQ3TCAV 是否在跨数据类别上提供稳定且具有统计显著性的概念重要性度量？
RQ4在网络的哪里（哪些层）学习了概念，这与预测影响有何关系？
RQ5TCAV 能否揭示标准网络中的偏见或不希望的敏感性（例如对性别或种族的敏感）？

主要发现

CAV 与预期概念一致，表现在定性排序和激活最大化可视化上。
TCAV 分数揭示跨层的概念影响，靠近 logits 的层对预测具有更直接的强影响。
统计检验可排除虚假 CAV，产生鲁棒的概念检测结果。
Relative CAV 使相关概念之间的细粒度比较成为可能。
在受控的真实标签实验中，TCAV 能紧密跟踪网络使用的真实概念，在某些情况下优于显著性图。
应用于医疗 DR 任务，TCAV 识别诊断相关概念并突出与领域专家期望的差异。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。