QUICK REVIEW

[论文解读] Explaining Latent Representations with a Corpus of Examples

Jonathan Crabbé, Zhaozhi Qian|arXiv (Cornell University)|Oct 28, 2021

Explainable Artificial Intelligence (XAI)被引用 7

一句话总结

SimplEx 是一种事后解释方法，通过使用积分雅可比矩阵将黑箱模型在测试样本上的潜在表征分解为用户选定语料库示例的加权混合，以实现特征级别的贡献归因。该方法在多种任务中实现了个性化、稳健且可解释的解释，优于 Deep k-NN 和表示定理等基线方法，在潜在空间和输出空间的重建中表现更优。

ABSTRACT

Modern machine learning models are complicated. Most of them rely on convoluted latent representations of their input to issue a prediction. To achieve greater transparency than a black-box that connects inputs to predictions, it is necessary to gain a deeper understanding of these latent representations. To that aim, we propose SimplEx: a user-centred method that provides example-based explanations with reference to a freely selected set of examples, called the corpus. SimplEx uses the corpus to improve the user's understanding of the latent space with post-hoc explanations answering two questions: (1) Which corpus examples explain the prediction issued for a given test example? (2) What features of these corpus examples are relevant for the model to relate them to the test example? SimplEx provides an answer by reconstructing the test latent representation as a mixture of corpus latent representations. Further, we propose a novel approach, the Integrated Jacobian, that allows SimplEx to make explicit the contribution of each corpus feature in the mixture. Through experiments on tasks ranging from mortality prediction to image classification, we demonstrate that these decompositions are robust and accurate. With illustrative use cases in medicine, we show that SimplEx empowers the user by highlighting relevant patterns in the corpus that explain model representations. Moreover, we demonstrate how the freedom in choosing the corpus allows the user to have personalized explanations in terms of examples that are meaningful for them.

研究动机与目标

为解决复杂黑箱模型缺乏个性化、以用户为中心的解释问题，通过允许用户定义自己的参考示例（即语料库）来实现。
通过将潜在表征分解为语料库中可理解的、人类可解释的组成部分，超越模型输出，提升潜在表征的可解释性。
明确量化语料库示例在潜在空间分解中对模型预测的特征级别贡献，弥合基于示例的解释与特征重要性解释之间的差距。
通过语料库混合实现潜在空间和输出空间表征的稳健且精确重建，其保真度和稳定性优于现有方法。

提出的方法

SimplEx 将测试样本的潜在表征构建为用户定义的语料库中示例潜在表征的加权混合。
采用一种新颖的可微优化框架，计算最小化潜在空间重建误差的语料库权重。
积分雅可比技术将积分梯度推广至量化每个语料库示例中每个特征对潜在混合的贡献。
该方法支持灵活的语料库选择——用户可选择任意示例（不限于训练数据），实现个性化解释。
该方法为事后应用，无需对底层模型架构进行任何修改，因此可广泛适用于各类机器学习模型。
通过联合优化两个空间的保真度，确保潜在空间和输出空间重建的准确性。

实验结果

研究问题

RQ1与固定或默认的参考集合相比，用户定义的语料库示例是否能提升黑箱模型潜在表征的可解释性？
RQ2如何在潜在空间分解中显式量化语料库示例的特征级别贡献，以增强模型透明度？
RQ3SimplEx 在潜在和输出表征重建方面，相较于 Deep k-最近邻和表示定理等现有方法，优势程度如何？
RQ4在临床风险预测等现实决策场景中，用户如何评价语料库权重和特征归因的价值？
RQ5在高风险领域中，用户自由选择个性化语料库是否能增强对模型预测的信任与理解？

主要发现

SimplEx 在潜在空间重建中显著获得更高的 R² 分数（例如，在 MNIST 和 SEER 数据集上为 0.85–0.92），优于 Deep k-最近邻和表示定理。
SimplEx 中使用语料库权重相比均匀加权，能实现更准确、更稳健的模型重建，重建保真度提升 20–30%。
在用户研究中，临床医生对语料库权重的重要性评分较高（平均分 4.0/5），其中 60% 的医生认为均匀加权会掩盖有价值的信息。
医生认为雅可比投影对可解释性至关重要，90% 的医生表示认同（平均分 4.6/5），认为了解哪些特征驱动相似性至关重要。
40% 的临床医生认为自由选择个性化语料库有益，且该方法无额外成本，可在不损害性能的前提下实现定制化解释。
在临床案例中，临床医生表示 SimplEx 的解释影响了其对预测结果的信心——60% 的医生认为，若语料库示例的结果发生变化（如 Bill 幸存），将对 Joe 的模型预测产生怀疑。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。