QUICK REVIEW

[论文解读] This Looks Like That: Deep Learning for Interpretable Image Recognition

Chaofan Chen, Oscar Li|arXiv (Cornell University)|Jun 27, 2018

Explainable Artificial Intelligence (XAI)参考文献 64被引用 565

一句话总结

ProtoPNet 引入了原型化部件推理用于图像分类，凭借可解释的、基于部件的解释实现具有竞争力的精度，并允许将多个 ProtoPNet 组合以在鸟类和汽车数据集上提升性能。

ABSTRACT

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models.

研究动机与目标

通过推动模型通过原型化部件来解释决策，激发可解释的图像分类，类似于人类推理。
开发一个神经网络架构，包含用于部件级推理的原型层。
在没有部件级标注的情况下，端到端地使用以部件为中心的约束来训练模型。
在 CUB-200-2011 与 Stanford Cars 数据集上展示可解释性与竞争性准确性。

提出的方法

介绍 ProtoPNet 架构：一个 CNN 主干 f，一个带有 m 个原型的原型层 g_p，以及一个不带偏置的最终线性层 h。
原型单元对 f(x) 的补丁计算平方 L2 距离，并通过单调变换将其转换为相似度分数。
每个原型表示与训练图像补丁对应的潜在补丁；原型按类别分配，并通过投影后最近的潜在训练补丁进行可视化。
训练分三个阶段进行：(i) 对卷积层和原型进行带聚类和分离损失的 SGD，以塑造潜在空间，(ii) 将原型投影到最近的潜在训练补丁，(iii) 对最后一层进行凸优化，以促进稀疏性和针对类别特定加权的忠实性。
训练期间模型仅使用图像级标签，原型可视化无需解码器即可实现。
关键方程包括原型相似度 g_p_j(z) = max over patches(z) of log(((||patch - p_j||^2 + 1) / (||patch - p_j||^2 + epsilon))) 和整理潜在空间的 Clst 与 Sep 项。

实验结果

研究问题

RQ1神经网络是否能够以内部可解释的方式通过原型化部件来进行图像分类？
RQ2将基于原型的推理机制纳入是否会在与不可解释基线相比保持具有竞争力的准确性？
RQ3组合多个 ProtoPNet 模型在保持可解释性的同时对准确性有何影响？
RQ4在如鸟类物种和汽车模型等细粒度任务中，原型及其可视化的定性行为是什么？

主要发现

在裁剪后的鸟类图像上，使用各种基础 CNN 时，ProtoPNet 的准确性与其非可解释基线模型相当。
将多个 ProtoPNet 组成的组合网络在裁剪后的鸟数据集上达到与部分最佳深度模型媲美的准确性（最高可达 84.8%）。
在完整图像上，单独的 ProtoPNet 模型准确性较低，但将基于 VGG19/ResNet34/DenseNet 的 ProtoPNets 组合起来可获得超过 80% 的准确性（例如组合模型的 80.8%）。
在汽车模型数据集上，组合 ProtoPNet 达到 91.4% 的准确性，与最新方法相比具有竞争力（例如前几名方法的 91.3%–92.8%）。
ProtoPNet 通过显示哪些原型化部件（如鸟头、翅膀）对决策有贡献，并提供相应的原型化图像补丁，提供真实、易于人类理解的解释。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。