QUICK REVIEW

[论文解读] Grad-CAM: Why did you say that?

Ramprasaath R. Selvaraju, Abhishek Das|arXiv (Cornell University)|Nov 22, 2016

Multimodal Machine Learning Applications参考文献 11被引用 325

一句话总结

介绍 Grad-CAM，一种基于梯度的定位方法，用于为 CNNs 生成对类别具有判别性的可视化解释，并将其与 Guided Backpropagation 结合，形成高分辨率的 Guided Grad-CAM 解释。

ABSTRACT

We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixel-space visualizations to create a novel high-resolution and class-discriminative visualization called Guided Grad-CAM. These methods help better understand CNN-based models, including image captioning and visual question answering (VQA) models. We evaluate our visual explanations by measuring their ability to discriminate between classes, to inspire trust in humans, and their correlation with occlusion maps. Grad-CAM provides a new way to understand CNN-based models. We have released code, an online demo hosted on CloudCV, and a full version of this extended abstract.

研究动机与目标

激发对透明的 CNN 解释的需求，这些解释应具备类别判别性与高分辨率。
引入 Grad-CAM，在不改变体系结构的情况下，利用类别特定梯度信息获得定位图。
将 Grad-CAM 与 Guided Backpropagation 结合，创建 Guided Grad-CAM，以获得高分辨率、类别判别的可视化。
展示其在图像描述和视觉问答（VQA）模型中的适用性。
通过人工评估和忠实度分析来评估解释，并发布代码/演示。

提出的方法

通过计算目标类别分数相对于卷积特征图的梯度来定义 Grad-CAM，并对这些梯度进行全局平均池化以获得通道权重，然后对特征图进行 ReLU 加权求和。
通过避免架构约束并使用梯度（基于梯度的定位）将 CAM 泛化到任意 CNN。
通过逐元素将 Grad-CAM 图与 Guided Backpropagation 可视化相乘来创建 Guided Grad-CAM，以实现高分辨率、类别判别的解释。
将 Grad-CAM 应用于图像描述和 VQA 模型，以证明其在多任务中的广泛适用性。
进行人工研究以评估辨识性和信任度，并将忠实度与基于遮挡的度量进行比较。

实验结果

研究问题

RQ1在不重新训练或更改架构的情况下，Grad-CAM 能否生成对类别具有判别性的定位图？
RQ2Guided Grad-CAM 是否提供高分辨率的解释，从而在可解释性和信任度上优于现有方法？
RQ3Grad-CAM 的解释是否忠实于模型的行为（如遮挡相关性所示）并且对目标类别具有判别力？
RQ4这些解释在更高层的任务（如图像描述和视觉问答）中的迁移效果如何？

主要发现

Grad-CAM 本地化具有类别判别性，且可以在不重新训练的情况下计算。
Guided Grad-CAM 将高分辨率细节与类别焦点结合起来，提升了人工可辨识性和对模型信任的感知。
在人工评估中，Guided Grad-CAM 在类别识别（44.44 vs 61.23% 的准确率）和感知可靠性（+1.00 vs +1.27）上优于 Guided Backpropagation，并且与遮挡基的忠实度相关性更高（0.168 vs 0.261）。
Guided Grad-CAM 有助于诊断模型失败并为 ImageNet 与 VQA 预测提供直观的解释。
该方法在图像描述和 VQA 流水线中得到验证，说明在基于 CNN 的任务中的广泛适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。