QUICK REVIEW

[论文解读] The Promise and Peril of Human Evaluation for Model Interpretability

Bernease Herman|arXiv (Cornell University)|Nov 20, 2017

Explainable Artificial Intelligence (XAI)参考文献 10被引用 45

一句话总结

本文提出了可解释人工智能中描述性解释与说服性解释之间的重要区分，认为功能可解释性可能因与人类认知和用户偏好相关联而无意中编码认知偏见。文章提出了两个研究方向，以将认知功能与解释模型解耦，从而在保持透明性的同时更好地控制准确率-可解释性权衡。

ABSTRACT

Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.

研究动机与目标

通过人类评估解决可解释机器学习中的透明性伦理挑战。
识别功能可解释性如何无意中反映并延续解释模型中的隐性认知偏见。
提出研究方向，将认知功能与解释模型解耦，以在不损害公平性的情况下保持可解释性。
通过将认知偏好与功能度量解耦，实现对准确率-可解释性权衡的更好控制。

提出的方法

提出一个概念框架，区分描述性解释（准确、事实性）与说服性解释（旨在影响认知）。
分析功能可解释性与认知功能之间的相关性，表明用户偏好可能反映认知偏见而非客观可解释性。
提出研究方向，聚焦于将认知机制与解释模型设计解耦，以避免偏见传播。
倡导评估框架将用户感知与模型保真度分离，通过受控的人类研究隔离认知影响。
开发独立于用户偏好的解释质量度量，强调功能效用而非感知清晰度。

实验结果

研究问题

RQ1机器学习模型中的功能可解释性在多大程度上反映底层认知功能，而非客观可解释性？
RQ2用户在解释评估中的偏好在多大程度上可能编码隐性认知偏见？
RQ3我们能否设计出将认知功能与解释模型性能解耦的评估方法？
RQ4基于偏好的度量在现实应用中对模型可解释性有何影响？

主要发现

功能可解释性可能与认知功能相关，表明用户在解释评估中的偏好可能反映认知偏见而非客观模型清晰度。
若在未区分认知影响与模型保真度的情况下使用功能度量，人类对解释的评估可能加剧隐性偏见。
区分描述性解释与说服性解释对于识别评估反映的是真实可解释性还是主观说服力至关重要。
将认知机制与解释模型解耦是保持透明性并避免强化偏见用户认知的必要条件。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。