QUICK REVIEW

[论文解读] Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Peter Hase, Mohit Bansal|arXiv (Cornell University)|May 4, 2020

Explainable Artificial Intelligence (XAI)参考文献 25被引用 24

一句话总结

本研究通过受控的人类受试实验，评估了五种算法解释方法——LIME、Anchor、Decision Boundary、Prototype以及一种综合方法——在文本和表格分类任务中提升人类用户预测模型行为能力的效果。研究发现，LIME在表格数据中提升了可模拟性，而Prototype方法则提高了反事实预测的准确性；然而，用户对解释质量的主观评分并不能预测其有效性。

ABSTRACT

Algorithmic approaches to interpreting machine learning models have proliferated in recent years. We carry out human subject tests that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors. A model is simulatable when a person can predict its behavior on new inputs. Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method. Clear evidence of method effectiveness is found in very few cases: LIME improves simulatability in tabular classification, and our Prototype method is effective in counterfactual simulation tests. We also collect subjective ratings of explanations, but we do not find that ratings are predictive of how helpful explanations are. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability across a variety of explanation methods and data domains. We show that (1) we need to be careful about the metrics we use to evaluate explanation methods, and (2) there is significant room for improvement in current methods. All our supporting code, data, and models are publicly available at: https://github.com/peterbhase/InterpretableNLP-ACL2020

研究动机与目标

隔离并测量算法解释对人类可模拟性（即预测模型在新输入上行为的能力）的影响。
通过受控的人类实验，在文本和表格数据领域评估解释方法。
确定用户对解释质量的主观评分是否与模拟任务中的实际有效性相关。
识别出哪种解释技术最可靠地提升用户对模型行为的理解。
提供一个全面且可靠的基准，以可模拟性为核心指标评估解释方法。

提出的方法

开展两类模拟任务：正向模拟（根据输入和解释预测模型输出）与反事实模拟（在输入扰动后预测模型输出）。
使用测试实例中不同的被解释样本，以防止答案泄露并确保解释未被记忆。
通过模型正确性对数据进行平衡，以防止用户通过猜测标签成功。
强制用户对所有输入做出预测，以避免对过于具体的解释产生偏倚。
评估五种解释方法：LIME、Anchor、Decision Boundary（潜在空间遍历）、Prototype（基于案例的推理）以及结合所有解释的综合方法。
收集用户对解释质量的主观数值评分，以评估其对可模拟性预测能力。

实验结果

研究问题

RQ1哪些算法解释方法在正向和反事实预测任务中，最有效地提升人类用户模拟模型行为的能力？
RQ2用户对解释质量的主观评分在多大程度上能预测其在可模拟性任务中的实际表现？
RQ3解释方法是否在文本和表格数据领域均提升可模拟性，还是效果具有领域特异性？
RQ4结合多种解释方法（综合方法）是否能带来优于单一方法的可模拟性？
RQ5数据分布和解释生成时间等混淆因素如何影响对解释有效性评估？

主要发现

LIME在表格分类任务中显著提升了正向和反事实可模拟性。
Prototype方法在文本和表格数据领域均提升了反事实可模拟性，在此设置下优于其他方法。
在文本领域，没有单一解释方法能始终如一地提升正向和反事实任务的可模拟性，尽管Prototype和综合方法在平均表现上最佳。
用户对解释质量的主观评分无法预测其在模拟任务中的有效性，表明感知效用与实际效用之间存在脱节。
尽管综合解释方法在质量评分上表现优异，但在任一数据领域均未带来可模拟性的提升，表明结合解释方法并不总是能增强用户理解。
本研究首次对解释方法影响可模拟性的效果进行了全面且受控的评估，揭示了大多数方法效果有限，凸显了对更优评估指标和改进解释技术的迫切需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。