QUICK REVIEW

[논문 리뷰] Comparing deep neural networks against humans: object recognition when the signal gets weaker

Robert Geirhos, David Janssen|arXiv (Cornell University)|2017. 06. 21.

Visual Attention and Saliency Detection참고 문헌 43인용 수 154

한 줄 요약

이 논문은 다양한 이미지 저하 하에서 인간과 심층 신경망(DNN)의 물체 인식을 비교하고, 인간이 일부 왜곡에 더 강인한 반면 DNN은 깨끗하고 컬러가 있는 이미지에서 인간을 능가할 수 있음을 보이며, 심리물리적으로 제어된 벤치마크와 분석 도구를 제공한다.

ABSTRACT

Human visual object recognition is typically rapid and seemingly effortless, as well as largely independent of viewpoint and object orientation. Until very recently, animate visual systems were the only ones capable of this remarkable computational feat. This has changed with the rise of a class of computer vision algorithms called deep neural networks (DNNs) that achieve human-level classification performance on object recognition tasks. Furthermore, a growing number of studies report similarities in the way DNNs and the human visual system process objects, suggesting that current DNNs may be good models of human visual object recognition. Yet there clearly exist important architectural and processing differences between state-of-the-art DNNs and the primate visual system. The potential behavioural consequences of these differences are not well understood. We aim to address this issue by comparing human and DNN generalisation abilities towards image degradations. We find the human visual system to be more robust to image manipulations like contrast reduction, additive noise or novel eidolon-distortions. In addition, we find progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker, indicating that there may still be marked differences in the way humans and current DNNs perform visual object recognition. We envision that our findings as well as our carefully measured and freely available behavioural datasets provide a new useful benchmark for the computer vision community to improve the robustness of DNNs and a motivation for neuroscientists to search for mechanisms in the brain that could facilitate this robustness.

연구 동기 및 목표

인간 관찰자와 널리 알려진 세 가지 DNN(AlexNet, GoogLeNet, VGG-16)이 저하된 이미지에 일반화하는 정도를 평가한다.
제어된 심리물리학적 방법을 사용하여 컬러, 대비, 가산 노이즈 및 에이돌론 왜곡에서의 강건성 차이를 정량화한다.
인간과 DNN 간의 오류 패턴을 범주 수준에서 세밀하게 비교한다.
DNN의 강건성 개선을 위한 자유롭게 이용 가능한 데이터셋과 분석 도구를 제공한다.

제안 방법

되돌림 마스킹(backward masking)을 포함한 짧고 고정 지속 시간(200 ms)의 이미지 자극 제시로 피드백 효과를 최소화한다.
동일한 저하 자극에 대해 중심 자르기(센터 크롭), 224×224 입력 파이프라인을 Caffe에서 사용하여 세 개의 DNN(AlexNet, GoogLeNet, VGG-16)을 평가한다.
조절된 일관성으로 grayscale 대 컬러, 대비 변화, 가산 백색 잡음, 및 에이돌론 왜곡을 통해 이미지를 조작한다.
정확도 및 16개 범주에 걸친 응답 분포 엔트로피를 계산하여 응답 편향을 평가한다.
사람과 각 DNN 간의 범주 수준 오류 패턴을 비교하기 위해 혼동 차이 행렬(confusion difference matrices)을 도입한다.
노이즈 하에서 성능이 일치하는 수준에서 짝지은 분석을 제공하여 오류 패턴의 발산을 시각화한다.

실험 결과

연구 질문

RQ1빠른 물체 인식 중에 색상, 대비, 노이즈 및 에이돌론 왜곡에 대한 강건성에서 인간과 표준 DNN이 어떻게 다른가?
RQ2저하된 이미지 조건에서 DNN과 인간이 유사한지 아니면 서로 다른 범주 수준 오류 패턴을 보이는가?
RQ3작업 난이도가 일치되는 정확도 수준으로 맞춰질 때 DNN의 오류 패턴이 인간의 성능과 얼마나 일치하는가?
RQ4결과 행동 데이터 세트가 DNN의 강건성 향상을 위한 벤치마크로 작용하고 시각 처리에 관한 신경과학 연구에 정보를 제공할 수 있는가?

주요 결과

대비와 노이즈 저하에 대해 인간이 DNN보다 더 강인하며, 저하된 조건에서도 인간이 더 높은 정확도를 유지한다.
세 가지 DNN 모두 저하 조건에서 소수 범주에 강한 편향을 보이는 반면, 인간은 응답을 더 고르게 분포시킨다.
DNN은 저하되지 않은 컬러 이미지에서 인간을 능가할 수 있지만, 저하 및 피드백 최소화와 함께 그 이점은 약해진다.
혼동 차이 행렬은 인간과 DNN 간의 오류 패턴에서 범주 특정의 차이를 드러내며, 특히 더 높은 작업 난이도에서 그렇다.
에이돌론 왜곡(coherence) 결과는 중간 왜곡에서 인간이 DNN보다 더 높은 정확도를 유지하는 반면, 강한 왜곡에서는 네트워크가 편향된 응답으로 수렴한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.