[论文解读] Evaluating Explanation Without Ground Truth in Interpretable Machine Learning
本文界定并研究在可解释性机器学习中如何在没有地面真相解释的情况下评估解释,提出可泛化、可信且具说服力的标准,以及用于评估的统一分层框架。
Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how machine learning systems work and further enhance their trust towards systems. However, due to the diversified scenarios and subjective nature of explanations, we rarely have the ground truth for benchmark evaluation in IML on the quality of generated explanations. Having a sense of explanation quality not only matters for assessing system boundaries, but also helps to realize the true benefits to human users in practical settings. To benchmark the evaluation in IML, in this article, we rigorously define the problem of evaluating explanations, and systematically review the existing efforts from state-of-the-arts. Specifically, we summarize three general aspects of explanation (i.e., generalizability, fidelity and persuasibility) with formal definitions, and respectively review the representative methodologies for each of them under different tasks. Further, a unified evaluation framework is designed according to the hierarchical needs from developers and end-users, which could be easily adopted for different scenarios in practice. In the end, open problems are discussed, and several limitations of current evaluation techniques are raised for future explorations.
研究动机与目标
- 澄清在 IML 中在没有地面真相的情况下评估解释的问题。
- 定义解释的三个核心属性:可泛化性、保真度和说服性。
- 回顾现有的评估方法,覆盖不同解释类型和应用场景。
- 提出一个统一的、分层的评估框架,与开发者和最终用户的需求保持一致。
提出的方法
- 使用二维方案对解释进行分类:解释范围(全局/局部)和解释方式(内在/后验)。
- 给出可泛化性、保真度与说服性的精确定义的形式化定义。
- 系统地回顾跨任务的与每个属性相关的现有评估方法。
- 提出一个统一的分层评估框架,包含对应于可泛化性、保真度和说服性的三个层级。
- 讨论评估解释的基准测试的未解问题、局限性及未来方向。
实验结果
研究问题
- RQ1在没有地面真相的情况下,如何评估 IML 的解释?
- RQ2哪些正式属性能最好地捕捉跨任务的解释质量?
- RQ3一个统一框架如何支持为开发者和最终用户基准测试解释?
- RQ4当前用于解释评估的技术有哪些关键未解决问题和限制?
- RQ5评估框架应如何处理局部 vs 全局以及内在 vs 后验的解释?
主要发现
- 定义了三项通用属性——可泛化性、保真度和说服性,作为评估 IML 解释的核心标准。
- 可泛化性可以与内在-全局解释的传统模型评估对齐,并且可以与后验-全局解释的代理代理对齐。
- 保真度衡量解释对目标系统的忠实程度,针对后验-局部解释使用消融/扰动方法。
- 说服性评估人类的有用性和可理解性,通常需要人类研究或注释。
- 提出一个统一的分层框架,以从底层的可泛化性到顶层的说服性组织评估,且针对开发者与最终用户的需求进行定制。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。