QUICK REVIEW

[论文解读] Explaining Explanations: An Overview of Interpretability of Machine Learning

Leilani H. Gilpin, David Bau|arXiv (Cornell University)|May 31, 2018

Explainable Artificial Intelligence (XAI)被引用 30

一句话总结

本文提出了一套统一框架，用于评估和标准化机器学习中的可解释性与可解释性，尤其针对深度神经网络。该框架引入了解释的分类法，区分了可解释性与可解释性的概念，并倡导采用多维评估指标以提升人工智能系统的可信度、公平性和透明度。

ABSTRACT

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.

研究动机与目标

解决可解释人工智能（XAI）方法中缺乏标准化和系统性评估的问题。
澄清可解释性（模型透明性）与可解释性（系统生成的合理化解释）之间的区别。
建立评估机器学习中解释的基础概念与最佳实践。
识别当前方法在深度神经网络方面的局限性，并提出未来研究方向。
推动跨学科协作，以提升人工智能系统的可靠性和可信度。

提出的方法

基于被解释内容（如模型行为、内部表征或决策过程）构建解释的分类法。
提出一个框架，将现有XAI技术按解释类型、目标受众和评估方法等维度进行分类。
提出结合忠实度、用户一致性与完整性等标准的评估准则，以评估解释质量。
回顾并比较现有方法，如注意力图、概念激活向量（CAVs）和解耦表征，用于可解释性评估。
强调多模态评估：将解释与人类注意力进行对比，使用已知因素的合成数据进行测试，并开展用户研究。
倡导整合不同领域（如因果推断、人机交互、伦理学）的技术，以构建更稳健和可信的解释。

实验结果

研究问题

RQ1在机器学习系统中，可解释性与可解释性有何区别？
RQ2如何系统性地评估解释的忠实度、相关性与用户一致性？
RQ3当前解释方法在深度神经网络中存在哪些局限性，特别是在对抗鲁棒性与偏见方面？
RQ4如何使多样化的评估指标与解释的目的性和完整性相一致？
RQ5需要哪些跨学科方法来推动可解释人工智能的发展？

主要发现

可解释性与可解释性是不同的概念：可解释模型在设计上具有透明性，但并非所有可解释模型都能生成可操作的解释。
当前用于深度神经网络的解释方法通常无法可靠地捕捉因果关系或检测偏见，尤其是在对抗性条件下。
解释的评估与模型行为本质上是耦合的——不合理的解释可能源于模型缺陷或解释生成器缺陷。
可通过迁移任务测试解释的忠实度，例如使用CAVs检测图像分类中对文本线索的依赖性。
人类评估与用户研究对于验证解释是否符合用户期望并提升可信度至关重要。
必须结合自动化指标与人工评估的多维评估策略，才能实现稳健的解释评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。