QUICK REVIEW

[論文レビュー] Rethinking Classification and Localization for Object Detection

Yue Wu, Yinpeng Chen|arXiv (Cornell University)|Apr 13, 2019

Advanced Neural Network Applications参考文献 47被引用数 40

ひとこと要約

本論文は、全結合検出ヘッドと畳み込み検出ヘッドが分類とローカライズに与える影響を異なる視点から分析し、分類にはfc-head、境界ボックス回帰にはconv-headを組み合わせたDouble-Head検出器を提案する。これによりCOCOで顕著なAP向上を達成する。

ABSTRACT

Two head structures (i.e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks. However, there is a lack of understanding of how does these two head structures work for these two tasks. To address this issue, we perform a thorough analysis and find an interesting fact that the two head structures have opposite preferences towards the two tasks. Specifically, the fully connected head (fc-head) is more suitable for the classification task, while the convolution head (conv-head) is more suitable for the localization task. Furthermore, we examine the output feature maps of both heads and find that fc-head has more spatial sensitivity than conv-head. Thus, fc-head has more capability to distinguish a complete object from part of an object, but is not robust to regress the whole object. Based upon these findings, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head for bounding box regression. Without bells and whistles, our method gains +3.5 and +2.8 AP on MS COCO dataset from Feature Pyramid Network (FPN) baselines with ResNet-50 and ResNet-101 backbones, respectively.

研究の動機と目的

二段階検出器におけるfc-headとconv-headが分類とローカライズに与える影響を理解する。
MS COCO 2017 バリデーションで事前定義された提案を用いてfc-headとconv-headを実証的に比較する。
2つのヘッドの相補的な長所と短所を特定する。
双頭アーキテクチャ（Double-Head）を提案し、検出精度向上のために両ヘッドを活用する。
精度をさらに向上させるために、焦点を絞らないタスクの活用や統合を検討する。

提案手法

ResNet-50を用いたFPN上でfc-headとconv-headを訓練・比較し、分類とローカライズ性能を評価する。
出力特徴マップを分析し、空間感度とIoUとの相関を測定する。
Double-Headアーキテクチャを提案する：分類にはfc-head、境界ボックス回帰にはconv-head。
推論時の未焦点タスクの監督と分類器の融合を組み込むことでDouble-Head-Extへ拡張する。
COCOとVOC07で、バックボーンとヘッド構成のアブレーションを行い評価する。

実験結果

リサーチクエスチョン

RQ1fc-headとconv-headは分類とローカライズに対して補完的な長所を持つか？
RQ2fc-headとconv-headの空間感度はどのように異なり、IoUの相関にどう影響するか？
RQ3タスクを2つのヘッドに分離することで、単一ヘッドのベースラインより検出性能を向上させることができるか？
RQ4未焦点タスクと分類器の融合を取り入れると、精度はさらに向上するか？

主な発見

手法	バックボーン	AP	AP 0.5	AP 0.75	AP S	AP M	AP L
FPN baseline	ResNet-50	36.8	58.7	40.4	21.2	40.1	48.8
Double-Head	ResNet-50	39.8	59.6	43.6	22.7	42.9	53.1
Double-Head-Ext	ResNet-50	40.3	60.3	44.2	22.4	43.3	54.3
FPN baseline	ResNet-101	39.1	61.0	42.4	22.2	42.5	51.0
Double-Head	ResNet-101	41.5	61.7	45.6	23.8	45.2	54.9
Double-Head-Ext	ResNet-101	41.9	62.4	45.9	23.9	45.2	55.8

fc-headはIoUとより相関する分類スコアを高く出す傾向があり、特に小さな物体で顕著である。
conv-headはfc-headよりも境界ボックス回帰をより正確に行う。
Double-Head（分類にfc-head、回帰にconv-head）はResNet-50およびResNet-101バックボーンでのCOCOにおける単一ヘッドベースラインより優れている。
Double-Head-Extは未焦点タスクの監督と分類子の融合により結果をさらに改善し、1段階の学習でCOCO test-devに対して最先端風の利得を達成する。
VOC07では、Double-Head-ExtはAP、AP@0.5、AP@0.75の各指標でFPNベースラインを顕著に上回る。
COCO val2017の結果では、Double-Head-Extは42.3 AP（ResNet-101）と、さまざまなしきい値で49%以上のAPを達成。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。