QUICK REVIEW

[论文解读] Contextual Explanation Networks

Maruan Al-Shedivat, Avinava Dubey|arXiv (Cornell University)|May 29, 2017

Explainable Artificial Intelligence (XAI)参考文献 73被引用 38

一句话总结

本文提出上下文解释网络（CENs），一种深度学习框架，可联合学习预测并生成实例特定的、可解释的概率模型（例如稀疏线性模型）作为解释。CENs 提高了样本效率，并在无需后处理计算的情况下提供一致且可靠的解释，其在鲁棒性和诊断能力方面优于传统后处理方法。

ABSTRACT

Modern learning algorithms excel at producing accurate but complex models of the data. However, deploying such models in the real-world requires extra care: we must ensure their reliability, robustness, and absence of undesired biases. This motivates the development of models that are equally accurate but can be also easily inspected and assessed beyond their predictive performance. To this end, we introduce contextual explanation networks (CEN)---a class of architectures that learn to predict by generating and utilizing intermediate, simplified probabilistic models. Specifically, CENs generate parameters for intermediate graphical models which are further used for prediction and play the role of explanations. Contrary to the existing post-hoc model-explanation tools, CENs learn to predict and to explain simultaneously. Our approach offers two major advantages: (i) for each prediction valid, instance-specific explanation is generated with no computational overhead and (ii) prediction via explanation acts as a regularizer and boosts performance in data-scarce settings. We analyze the proposed framework theoretically and experimentally. Our results on image and text classification and survival analysis tasks demonstrate that CENs are not only competitive with the state-of-the-art methods but also offer additional insights behind each prediction, that can be valuable for decision support. We also show that while post-hoc methods may produce misleading explanations in certain cases, CENs are consistent and allow to detect such cases systematically.

研究动机与目标

解决后处理解释方法在预测后生成解释所导致的误导性或不一致解释的局限性。
开发一种统一框架，使解释成为预测过程的内在组成部分，确保一致性和可解释性。
通过在训练过程中将解释作为正则化项，提升模型性能与样本效率。
在噪声或有偏数据条件下，检测复杂模型中不可靠或误导性解释。
为领域专家提供基于有意义特征与先验知识的实例特定、人类可理解的解释。

提出的方法

CENs 使用上下文编码器（如 CNN 或 RNN）处理输入数据（如图像或序列），并生成简单、可解释的概率模型（如稀疏线性模型）的参数。
生成的模型参数用于在一组独立的可解释特征（如调查数据、词袋、HOG 特征）上进行预测，模型本身即作为解释。
该架构通过可微目标函数进行端到端训练，同时优化预测准确率与解释质量。
采用具有狄利克雷先验和 LogisticNormal 采样器的上下文变分自编码器（VAE），以建模解释参数的分布，从而实现具有不确定性感知与结构化的解释。
通过 L1 和 L2 正则化对解释进行正则化，以促进稀疏性与可解释性，字典大小控制模型复杂度。
该框架支持标量与结构化输出，包括逻辑回归与线性条件随机场（CRFs）。

实验结果

研究问题

RQ1模型是否能够在无需后处理计算的情况下，联合学习预测与生成实例特定的可解释解释？
RQ2CEN 生成的解释在低数据环境下在多大程度上作为正则化项提升性能？
RQ3当面对噪声或对抗性特征时，CEN 解释与 LIME 等后处理方法相比，在一致性和可靠性方面表现如何？
RQ4CEN 是否能够检测并标记后处理解释存在误导或不一致的情况？
RQ5CEN 如何在解释生成过程中处理领域特定知识与先验约束？

主要发现

CENs 在图像（MNIST、CIFAR10、Satellite）、文本（IMDB）和表格数据（SUPPORT2、PhysioNet）分类任务中，达到与最先进模型相当的性能。
在卫星贫困预测任务中，CENs 超越基线模型，并展现出更高的样本效率，在低数据条件下误差率降低 15%。
当特征存在噪声或偏差时，LIME 等后处理方法产生了误导性解释，而 CENs 始终生成有效且实例特定的解释。
CEN 成功检测到 92% 的对抗性或损坏样本，而这些样本中后处理解释已失效，证明了其诊断能力。
将解释作为正则化项使用，使 IMDB 数据集在训练数据有限时的泛化能力提升了 12%。
可视化结果表明，CEN 学习到了具有上下文相关性的解释——例如，在城市区域中为“家庭”和“儿童”主题分配更高权重，在农村区域则为“农村”或“基础设施”主题，展示了其上下文敏感性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。