QUICK REVIEW

[论文解读] Comparing deep neural networks against humans: object recognition when the signal gets weaker

Robert Geirhos, David Janssen|arXiv (Cornell University)|Jun 21, 2017

Visual Attention and Saliency Detection参考文献 43被引用 154

一句话总结

本文比较了人类与深度神经网络（DNNs）在各种图像降质下的对象识别能力，结果显示人类对某些失真更具鲁棒性，而在干净、彩色图像上DNNs 可以超越人类；它提供了一个心理物理学控制的基准和分析工具。

ABSTRACT

Human visual object recognition is typically rapid and seemingly effortless, as well as largely independent of viewpoint and object orientation. Until very recently, animate visual systems were the only ones capable of this remarkable computational feat. This has changed with the rise of a class of computer vision algorithms called deep neural networks (DNNs) that achieve human-level classification performance on object recognition tasks. Furthermore, a growing number of studies report similarities in the way DNNs and the human visual system process objects, suggesting that current DNNs may be good models of human visual object recognition. Yet there clearly exist important architectural and processing differences between state-of-the-art DNNs and the primate visual system. The potential behavioural consequences of these differences are not well understood. We aim to address this issue by comparing human and DNN generalisation abilities towards image degradations. We find the human visual system to be more robust to image manipulations like contrast reduction, additive noise or novel eidolon-distortions. In addition, we find progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker, indicating that there may still be marked differences in the way humans and current DNNs perform visual object recognition. We envision that our findings as well as our carefully measured and freely available behavioural datasets provide a new useful benchmark for the computer vision community to improve the robustness of DNNs and a motivation for neuroscientists to search for mechanisms in the brain that could facilitate this robustness.

研究动机与目标

评估人类观察者和三种知名的 DNN（AlexNet、GoogLeNet、VGG-16）对降级图像的泛化能力。
利用受控的心理物理方法量化在色彩、对比度、叠加噪声和 eidolon 失真下的鲁棒性差异。
提供人类与 DNNs 之间错误模式的细粒度、类别级比较。
提供可自由获取的数据集和分析工具，以基准测试并指导 DNN 的鲁棒性改进。

提出的方法

以简短的、固定持续时间（200 ms）的图像呈现并使用向后掩蔽来最小化反馈效应。
在同样的降级刺激上评估三种 DNN（AlexNet、GoogLeNet、VGG-16），使用 Caffe 的中心裁剪、224×224 输入流水线。
通过灰度与彩色、改变对比度、叠加白噪声以及带有受控相干性的 eidolon 失真来处理图像。
在 16 个类别上计算准确率和响应分布熵，以评估响应中的偏差。
引入混淆差异矩阵，用以比较人类与每个 DNN 在类别层面的错误模式。
在匹配的性能水平下提供配对分析，以可视化在噪声下错误模式的分歧。

实验结果

研究问题

RQ1在人类与标准 DNN 在快速物体识别中，对色彩、对比度、噪声和 eidolon 失真敏感性方面有何差异？
RQ2在降级的图像条件下，DNN 与人类是否表现出相似或分歧的类别级错误模式？
RQ3当任务难度通过匹配的准确率水平来等化时，DNN 的错误模式与人类表现的一致性到何种程度？
RQ4所得的行为数据集是否可作为改进 DNN 鲁棒性并为视觉处理神经科学研究提供基准？

主要发现

人类在对比度和噪声降级方面比 DNNs 更具鲁棒性，在降级条件下保持更高的准确率。
在降级条件下，三种 DNN（AlexNet、GoogLeNet、VGG-16）都表现出对少数类别的强偏好，与之相反，人类的响应分布更加均匀。
在非降级的彩色图像上，DNNs 能超越人类，但在降级和回馈最小化后，这一优势减弱。
混淆差异矩阵揭示在人类与 DNNs 之间的错误模式在类别层面的差异，尤其是在更高任务难度下。
eidolon 失真（相干度）结果显示在人类在中等失真下保持比 DNNs 更高的准确率，而在强失真下网络趋向于偏向性响应。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。