QUICK REVIEW

[论文解读] How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation

Menaka Narayanan, Emily Chen|arXiv (Cornell University)|Feb 2, 2018

Explainable Artificial Intelligence (XAI)参考文献 42被引用 97

一句话总结

本文通过实证研究探究决策集解释的哪些属性最影响人类在将输出与输入比对时的能力，使用两个领域（recipe 和 clinical）以及多种复杂性因素。

ABSTRACT

Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable in the specific context of verification. Suppose we have a machine learning system that predicts X, and we provide rationale for this prediction X. Given an input, an explanation, and an output, is the output consistent with the input and the supposed rationale? Via a series of user-studies, we identify what kinds of increases in complexity have the greatest effect on the time it takes for humans to verify the rationale, and which seem relatively insensitive.

研究动机与目标

量化在验证任务中使解释具备人类可解释性的因素。
确定决策集解释中哪些因素最增加验证负担。
评估领域上下文（食谱 vs 临床）是否影响对解释的处理。
为在 ML 系统中设计对人类友好的解释提供指南。

提出的方法

使用以决策集呈现的合成解释选择进行受控用户研究。
通过改变行数和输出术语长度来操控解释规模。
引入新的认知块并测试显式与隐式分块。
通过跨行变化输入术语的重复次数来衡量搜索努力。
在两个领域测试（alien recipe recommendations 和 alien medical treatments）并行任务。
测量每个条件下的响应时间、准确性和主观满意度。

实验结果

研究问题

RQ1哪些解释属性（规模、认知分块和术语重复）最影响人类的验证表现？
RQ2明确引入新概念与隐式嵌入它们是否影响处理时间与满意度？
RQ3解释复杂性影响在不同领域（食谱 vs 临床）之间是否一致？
RQ4解释复杂性因素如何影响解释的准确性和主观信任？

主要发现

增加解释复杂性通常会提高响应时间并降低满意度。
行数和输出从句的长度最显著增加处理时间。
引入新的认知块（显式）往往比隐式嵌入概念更增加处理时间，并可能降低满意度。
术语重复对响应时间和满意度的影响比增加行数或新概念更微妙。
准确性对解释复杂性的变化相对鲁棒，而处理成本主要转向响应时间和满意度。
结果在食谱和临床领域基本一致，表明解释设计具有普遍适用性原则。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。