QUICK REVIEW

[论文解读] Generalisation in humans and deep neural networks

Robert Geirhos, Carlos R. Medina Temme|arXiv (Cornell University)|Aug 27, 2018

Infrared Target Detection Methodologies参考文献 72被引用 239

一句话总结

这项研究比较了人类对象识别在十二种图像降级下的鲁棒性，与三种预训练的深度神经网络（ResNet-152、GoogLeNet、VGG-19）进行对比，显示人类在未见过的扭曲上具备更好的泛化能力；直接在扭曲上训练的DNN在训练过的扭曲内表现强烈，但对新的扭曲泛化较差。

ABSTRACT

We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system.

研究动机与目标

评估人类和当前DNN在超出训练分布的广泛图像失真上的泛化能力。
在12种失真类型上量化人类与DNN之间的鲁棒性差异。
评估在扭曲图像上训练DNN是否能改善跨扭曲泛化。
提供一个经过精心测量的人机基准数据集，用于对抗图像降解的终身鲁棒性。

提出的方法

使用基于ImageNet的16个一级类别任务，在12种扭曲上比较人类与三种预训练DNN（ResNet-152、GoogLeNet、VGG-19）。
使用受控的200毫秒呈现时间，配合1/f噪声掩蔽，以限制人类的反馈。
评估在颜色变化、噪声（均匀噪声和椒盐噪声）、模糊/高通/低通滤波、对比度、相位噪声、 Eidolon失真，以及旋转等扭曲上的表现。
从头开始在扭曲的16类ImageNet图像上训练网络，以测试特定扭曲的鲁棒性和跨扭曲泛化。
分析分类准确率和响应分布熵，以表征错误模式与偏差。

实验结果

研究问题

RQ1人类相对于在训练中未见过的各种图像降解的鲁棒性如何？
RQ2在扭曲上训练的DNN是否能泛化到其他未见过的扭曲？
RQ3在扭曲图像上训练DNN是否能提升对多种扭曲的鲁棒性，而不仅是已训练的扭曲？
RQ4在降解条件下人类与DNN之间的错误模式差异是什么？

主要发现

当信号强度下降时，人类在大多数扭曲上的鲁棒性要高于DNN。
在扭曲上训练的DNN在其被训练的确切扭曲上表现出色，但对其他扭曲的泛化较差。
对扭曲的训练往往不能稳健地转移到未见扭曲上，可能需要更长的训练时间或不同的策略。
DNN在预测中表现出扭曲特定的偏置（例如在强均匀噪声下的瓶子偏置；在相位噪声下的狗/鸟偏置）。
当在除了一个扭曲之外的所有扭曲上进行训练时，网络在那八个扭曲上达到高准确率，但对留出的扭曲（椒盐噪声和均匀噪声）仍接近机会水平。
专门的逐扭曲训练方案能够缩小已训练扭曲的差距，但未能实现广义的跨扭曲鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。