QUICK REVIEW

[论文解读] Pathologies of Neural Models Make Interpretations Difficult

Shi Feng, Edward W. Wallace|arXiv (Cornell University)|Apr 20, 2018

Explainable Artificial Intelligence (XAI)参考文献 39被引用 26

一句话总结

该论文揭示了神经模型在输入缩减条件下表现出病态行为：模型对看似随机、无意义的极简输入仍保持高度置信。通过使用基于梯度的输入缩减和束搜索，作者暴露了模型的过度自信和不确定性校准不足的问题，并提出熵正则化方法在不损失准确率的前提下提升模型可解释性。

ABSTRACT

One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word. To understand the limitations of these methods, we use input reduction, which iteratively removes the least important word from the input. This exposes pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods. As we confirm with human experiments, the reduced examples lack information to support the prediction of any label, but models still make the same predictions with high confidence. To explain these counterintuitive results, we draw connections to adversarial examples and confidence calibration: pathological behaviors reveal difficulties in interpreting neural models trained with maximum likelihood. To mitigate their deficiencies, we fine-tune the models by encouraging high entropy outputs on reduced examples. Fine-tuned models become more interpretable under input reduction without accuracy loss on regular examples.

研究动机与目标

探究为何基于输入扰动和梯度归因的解释方法在输入缩减条件下无法生成有意义的解释。
揭示神经模型在语义不连贯的输入上仍保持高度置信的病态行为。
理解这些病态行为的根本原因，特别是模型的过度自信和校准不足。
提出一种缓解策略——熵正则化，以在不损害标准任务准确率的前提下提升可解释性。

提出的方法

通过基于梯度重要性得分逐轮移除最不重要的词来实施输入缩减，同时保持模型原始预测不变。
在输入缩减过程中使用束搜索，以探索多条缩减路径，寻找能维持预测置信度的最短输入。
每个词的重要性通过模型置信度相对于该词移除的梯度计算：g(xi|x) = f(y|x) − f(y|x−i)。
通过众包方式进行人工评估，将缩减后的输入与随机移除词语的结果进行比较，评估其语义连贯性感知。
在微调过程中应用熵正则化，以鼓励模型在缩减输入上表现出更高的不确定性，从而对抗过度自信。
该方法在三个NLP任务上进行评估：SQuAD（阅读理解）、SNLI（文本蕴含）和VQA（视觉问答）。

实验结果

研究问题

RQ1为何基于输入扰动和梯度的解释方法在应用于缩减输入时无法生成有意义的解释？
RQ2为何神经模型在经过多次词语移除后，对对人类而言无意义的输入仍保持高度置信？
RQ3这些病态行为与对抗样本及模型过度自信之间有何关联？
RQ4能否通过正则化提升模型不确定性，从而使解释更具鲁棒性？
RQ5熵正则化在多大程度上提升了可解释性，同时保持了标准准确率？

主要发现

基于梯度重要性的输入缩减可将输入缩减至仅一两个词——通常对人类而言无意义——但模型仍保持高度置信。
在人工评估中，缩减后的输入与随机打乱的词序几乎无法区分，表明其看似随意且语义不连贯。
即使原始上下文被大幅改变，模型对这些缩减输入仍保持高度自信，表明其在退化输入上存在过度自信。
这种病态行为与模型过度自信和校准不足有关，类似于对抗样本和纯噪声产生的“垃圾”输入。
在微调过程中应用熵正则化可降低模型在缩减输入上的过度自信，从而生成更可解释、更连贯的缩减示例，且未造成准确率损失。
该方法在SQuAD、SNLI和VQA任务上成功缓解了病态行为，显著提升了模型行为与人类可解释性之间的对齐。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。