QUICK REVIEW

[論文レビュー] Comparing deep neural networks against humans: object recognition when the signal gets weaker

Robert Geirhos, David Janssen|arXiv (Cornell University)|Jun 21, 2017

Visual Attention and Saliency Detection参考文献 43被引用数 154

ひとこと要約

この論文は、人間と深層ニューラルネットワーク(DNN)の物体認識をさまざまな画像劣化下で比較し、人間は一部の歪みに対してより頑健である一方、DNNはクリーンでカラー画像で人間を上回ることがあることを示す。心理物理学的に厳密に統制されたベンチマークと分析ツールを提供する。

ABSTRACT

Human visual object recognition is typically rapid and seemingly effortless, as well as largely independent of viewpoint and object orientation. Until very recently, animate visual systems were the only ones capable of this remarkable computational feat. This has changed with the rise of a class of computer vision algorithms called deep neural networks (DNNs) that achieve human-level classification performance on object recognition tasks. Furthermore, a growing number of studies report similarities in the way DNNs and the human visual system process objects, suggesting that current DNNs may be good models of human visual object recognition. Yet there clearly exist important architectural and processing differences between state-of-the-art DNNs and the primate visual system. The potential behavioural consequences of these differences are not well understood. We aim to address this issue by comparing human and DNN generalisation abilities towards image degradations. We find the human visual system to be more robust to image manipulations like contrast reduction, additive noise or novel eidolon-distortions. In addition, we find progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker, indicating that there may still be marked differences in the way humans and current DNNs perform visual object recognition. We envision that our findings as well as our carefully measured and freely available behavioural datasets provide a new useful benchmark for the computer vision community to improve the robustness of DNNs and a motivation for neuroscientists to search for mechanisms in the brain that could facilitate this robustness.

研究の動機と目的

人間の観察者とよく知られた3つのDNN（AlexNet、GoogLeNet、VGG-16）が劣化した画像へ一般化する様子を評価する。
色、コントラスト、付加ノイズ、エイドロン歪みの下で、統制された心理物理的方法を用いてロバスト性の差を定量化する。
人間とDNNの誤りパターンのカテゴリレベルでの詳細な比較を提供する。
DNNのロバスト性改善を評価・指針するための自由に入手できるデータセットと分析ツールを提供する。

提案手法

後方マスキングを用いてフィードバック効果を最小化し、短時間固定実行（200 ms）の画像提示を行う。
同じ劣化刺激に対して、Caffeのセンタークロップ、224×224入力パイプラインを用いて3つのDNN（AlexNet、GoogLeNet、VGG-16）を評価する。
グレースケールとカラー、コントラストの変化、付加白色ノイズ、統制されたコヒーレンスのエイドロン歪みを用いて画像を操作する。
16カテゴリにわたる正解率と応答分布エントロピーを計算し、応答の偏りを評価する。
人間と各DNNとのカテゴリレベルの誤りパターンを比較する混同行差矩陣を導入する。
ノイズ下での誤りパターンの乖離を可視化するため、同等の性能レベルでの対比較分析を提供する。

実験結果

リサーチクエスチョン

RQ1急速な物体認識において、色、コントラスト、ノイズ、エイドロン歪みへのロバスト性は、人間と標準的なDNNでどのように異なるか。
RQ2劣化した画像条件下で、DNNと人間は類似の誤りパターンを示すか、それとも乖離するか。
RQ3タスク難易度を一致させた正確さレベルで、DNNの誤りパターンは人間のパフォーマンスとどの程度一致するか。
RQ4得られた行動データセットは、DNNのロバスト性を向上させるベンチマークとして、視覚処理に関する神経科学研究を支援することができるか。

主な発見

人間はコントラストとノイズの劣化に対してDNNよりも頑健で、劣化条件下でも人間はより高い正確さを維持する。
3つのDNNすべてが劣化条件下で特定のいくつかのカテゴリに強い偏りを示すのに対し、人間は応答をより均等に分布させる。
DNNは非劣化のカラー画像で人間を上回ることがあるが、劣化とフィードバックの最小化が進むとその優位性は薄れる。
混同行差矩陣は、人間とDNN間の誤りパターンのカテゴリ別の乖離を、特にタスク難易度が高い場合に明らかにする。
エイドロン歪み（コヒーレンス）の結果は、中間程度の歪みでは人間がDNNより高い正確さを維持する一方、強い歪みではネットワークが偏った応答へ収束することを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。