QUICK REVIEW

[論文レビュー] Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet

Wieland Brendel, Matthias Bethge|arXiv (Cornell University)|Mar 20, 2019

Advanced Neural Network Applications被引用数 185

ひとこと要約

BagNets は小さなパッチからの局所特徴の線形バゲットを用いて画像を分類し、強力な ImageNet 精度を達成するとともに、パッチレベルのヒートマップを介して各画像領域のクラスへの証拠を示す直感的な解釈性を実現します。

ABSTRACT

Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. Our model, a simple variant of the ResNet-50 architecture called BagNet, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (BoF) models popular before the onset of deep learning and reaches a surprisingly high accuracy on ImageNet (87.6% top-5 for 33 x 33 px features and Alexnet performance for 17 x 17 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the BagNets behave similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts. This suggests that the improvements of DNNs over previous bag-of-feature classifiers in the last few years is mostly achieved by better fine-tuning rather than by qualitatively different decision strategies.

研究の動機と目的

ImageNet における解釈可能な DNN の必要性を、複雑な空間階層への依存を減らすことから動機づける。
BagNet を、小さな画像パッチ上に構築された線形の局所特徴のバゲットモデルとして導入する。
小さなパッチと線形集約で高い精度を達成できることを示す。
局所パッチが意思決定に与える影響を示す解釈可能な証拠マップを提供する。

提案手法

ほとんどの 3x3 畳み込みを 1x1 畳み込みに置換して、最上位受容野を q × q ピクセルに制限する BagNet-q を構築する。
q × q サイズのパッチ特徴を抽出し、各パッチに対してクラス証拠（ロジット）を得るために線形分類器を適用する。
空間を横断してパッチレベルの証拠を平均化し、画像レベルのロジットを得る。
q ∈ {9, 17, 33} に対して ImageNet で BagNets を訓練し、標準 CNN と比較する。
ヒートマップとパッチレベルの証拠を分析して意思決定を解釈し、DNN の挙動と比較する。

実験結果

リサーチクエスチョン

RQ1小さなパッチを用いた線形局所特徴のバゲットモデルは ImageNet で競争力のある精度を達成できるか。
RQ2パッチレベルの証拠ヒートマップを介してそのようなモデルの意思決定はどれだけ解釈可能か。
RQ3標準的な DNN は BagNets と比較して局所特徴と空間関係のどちらに依存しているか。
RQ4BagNets と現代の DNN の意思決定過程は、特徴感度と画像部品間の相互作用の点で類似しているか。

主な発見

17×17 パッチの BagNets は top-5 精度 80.5%、33×33 パッチは top-5 で 87.6% の ImageNet を達成。
BagNets は q が {9,17,33} のとき約 155 枚/秒、ResNet-50 の約 570 枚/秒と比較。
ヒートマップは、特定のクラスを駆動する小さなパッチを示し、情報豊かな領域はしばしば物体の形状や特有の特徴に対応。
BagNets は画像部品間の相互作用が弱く、q ピクセル以上離れたパッチ間の空間配置不変性を示す。
BagNets のパッチレベルの証拠は他の DNN のアトリビューション信号と相関する；より深いネットは非線形な相互作用を強く示し、小さな局所マスクへの感度が低下する。
解釈性とコンピュータビジョンパイプラインにおける故障分析へのガイド。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。