QUICK REVIEW

[论文解读] Explainable Deep Learning: A Field Guide for the Uninitiated

Gabriëlle Ras, Ning Xie|arXiv (Cornell University)|Apr 30, 2020

Explainable Artificial Intelligence (XAI)被引用 92

一句话总结

本文介绍了一本现场指南，提出了一个简单的三维分类法，用于解释性DNN方法、评估方法，以及给新进入该领域的人们的实际设计考虑。

ABSTRACT

Deep neural networks (DNNs) have become a proven and indispensable machine learning tool. As a black-box model, it remains difficult to diagnose what aspects of the model's input drive the decisions of a DNN. In countless real-world domains, from legislation and law enforcement to healthcare, such diagnosis is essential to ensure that DNN decisions are driven by aspects appropriate in the context of its use. The development of methods and studies enabling the explanation of a DNN's decisions has thus blossomed into an active, broad area of research. A practitioner wanting to study explainable deep learning may be intimidated by the plethora of orthogonal directions the field has taken. This complexity is further exacerbated by competing definitions of what it means ``to explain'' the actions of a DNN and to evaluate an approach's ``ability to explain''. This article offers a field guide to explore the space of explainable deep learning aimed at those uninitiated in the field. The field guide: i) Introduces three simple dimensions defining the space of foundational methods that contribute to explainable deep learning, ii) discusses the evaluations for model explanations, iii) places explainability in the context of other related deep learning research areas, and iv) finally elaborates on user-oriented explanation designing and potential future directions on explainable deep learning. We hope the guide is used as an easy-to-digest starting point for those just embarking on research in this field.

研究动机与目标

定义一个简单的三维空间，用于对基础的可解释DNN方法进行分类。
总结模型解释的评估方法。
将可解释性放在相关深度学习研究领域的语境中。
为设计者提供构建可解释DNN系统的指南。
突出未来方向和局限性，以引导新的研究工作。

提出的方法

介绍一种三维可解释DNN方法分类法：可视化、模型蒸馏和内在方法。
描述可视化技术，包括基于反向传播和基于扰动的方法，以及常见形式，如显著性图和热图。
将模型蒸馏解释为创建一个白盒模型以模拟DNN行为以实现可解释性。
描述将解释内置于模型设计中的内在方法，以共同优化性能和解释性。
调研代表性方法（例如 CAM/Grad-CAM、LRP、DeepLIFT、Integrated Gradients）及其基础思想。
讨论解释系统的评估考虑因素和以用户为导向的设计含义。

实验结果

研究问题

RQ1什么是能够对基础可解释DNN方法进行分类的最小、直观的分类法？
RQ2应如何评估和验证解释以实现可信性和有用性？
RQ3可解释性与深度学习和AI的相关研究领域有何关系？
RQ4在构建可解释DNN系统时，设计者应考虑哪些实际因素？
RQ5未来可解释性研究的局限性和有前景的方向有哪些？

主要发现

本文提供了一个简单的三维空间，用于对基础的可解释DNN方法进行分类：可视化、模型蒸馏、和内在方法。
可视化方法分为基于反向传播和基于扰动的方法，通常呈现为显著性图或热图。
基础可视化技术包括激活最大化、反卷积、CAM/Grad-CAM，以及诸如LRP、DeepLIFT、Integrated Gradients等各种相关性基础方法。
模型蒸馏引入一个白盒代理模型，以揭示DNN学习的决策规则。
内在方法将解释嵌入模型设计中，使性能和可解释性能够共同优化。
本指南讨论评估、与相关领域的互补性，以及面向最终用户的实际设计考虑。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。