QUICK REVIEW

[论文解读] Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Been Kim, Martin Wattenberg|arXiv (Cornell University)|Nov 30, 2017

Explainable Artificial Intelligence (XAI)被引用 732

一句话总结

本文介绍 Concept Activation Vectors (CAVs) 与 Testing with CAVs (TCAV)，用于量化人类定义的概念对神经网络预测的影响，包含统计验证与多种应用。

ABSTRACT

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of "zebra" is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

研究动机与目标

将人类可解释的概念定义为一组示例数据，可能来自模型训练数据之外。
学习 Concept Activation Vectors，将其作为在激活空间中表示这些概念的方向。
通过方向导数（TCAV 分数）量化一个概念对某一类别预测的影响。
提供统计检验以验证 CAVs 是否与模型输出具有有意义的相关性。
展示全局（类别级）可解释性，并应用于现实场景，包括一个医疗影像任务。

提出的方法

通过在选定的层 l 收集正例集 P_C 和负例集 N 的激活，定义用户指定的概念 C。
训练线性分类器以区分 P_C 与 N 的激活 f_l(x)；将决策边界的法向量作为 Concept Activation Vector v_C^l。
将概念敏感性 S_{C,k,l}(x) 计算为类别对数 logits h_{l,k} 在 v_C^l 方向的方向导数，即 S_{C,k,l}(x) = ∇ h_{l,k}(f_l(x)) · v_C^l。
将类别 k 的所有输入 X_k 进行聚合以形成 TCAV 分数：TCAV_q_{C,k,l} = x 在 X_k 中满足 S_{C,k,l}(x) > 0 的分数。
通过在多次随机化下重复 CAV 学习并对 TCAV 分数相对于 0.5 进行 Bonferroni 校正的显著性检验来进行统计显著性测试。
可选地扩展为相对 CAVs，通过指示一个 1 维子空间的向量 v_{C,D} 来比较两个概念 C 和 D。

实验结果

研究问题

RQ1是否可以利用来自训练数据之外的人类定义的概念，在类别级别上解释和审计神经网络预测？
RQ2CAVs 是否在多次运行中提供稳定、具有统计显著性的与模型输出的关联？
RQ3网络中的哪些位置学习到概念，概念影响在各层如何变化？
RQ4基于 TCAV 的解释在可解释性和保真度方面与基于显著性的方法相比如何？
RQ5TCAV 是否可以应用于现实任务（如医疗影像）以揭示并潜在纠正模型偏差或错误？

主要发现

CAVs 与预期概念对齐，并且在不重新训练的情况下即可揭示流行网络中的偏见。
TCAV 分数在输出层附近更高，表明概念对预测在后期层有更直接的影响。
统计检验减少了虚假概念关联；许多 CAVs 通过显著性检验，而有些则未通过，从而过滤掉无关概念。
与显著性图相比，TCAV 在人类实验中更好地传达了概念相关性，并与受控设定中的真实概念使用保持一致。
应用于糖尿病性视网膜病变，TCAV 强调了与不同 DR 水平相关的概念，并有助于解释模型错误。
相对 CAVs 使得对紧密相关概念的细粒度比较成为可能，帮助细致解释。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。