QUICK REVIEW

[論文レビュー] NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

Xin Huang, Zheng Ge|arXiv (Cornell University)|Mar 28, 2020

Video Surveillance and Tracking Methods参考文献 29被引用数 31

ひとこと要約

見える領域を活用して混雑シーンでのNMSを改善するR2NMSとPaired-Box Modelを提案し、CrowdHumanとCityPersonsで最先端の結果を達成。

ABSTRACT

Although significant progress has been made in pedestrian detection recently, pedestrian detection in crowded scenes is still challenging. The heavy occlusion between pedestrians imposes great challenges to the standard Non-Maximum Suppression (NMS). A relative low threshold of intersection over union (IoU) leads to missing highly overlapped pedestrians, while a higher one brings in plenty of false positives. To avoid such a dilemma, this paper proposes a novel Representative Region NMS approach leveraging the less occluded visible parts, effectively removing the redundant boxes without bringing in many false positives. To acquire the visible parts, a novel Paired-Box Model (PBM) is proposed to simultaneously predict the full and visible boxes of a pedestrian. The full and visible boxes constitute a pair serving as the sample unit of the model, thus guaranteeing a strong correspondence between the two boxes throughout the detection pipeline. Moreover, convenient feature integration of the two boxes is allowed for the better performance on both full and visible pedestrian detection tasks. Experiments on the challenging CrowdHuman and CityPersons benchmarks sufficiently validate the effectiveness of the proposed approach on pedestrian detection in the crowded situation.

研究の動機と目的

重度の遮蔽を伴う混雑したシーンにおける歩行者検出の改善を動機付ける。
見える領域を活用して偽陽性を減らし真陽性を保持する新しいNMSのバリアントを導入する。
強い対応関係を持つ全身ボックスと可視ボックスを同時に予測するPBMを開発する。
CrowdHumanとCityPersonsのベンチマークで最先端の性能を示す。

提案手法

可視領域IoUを使用して全身ボックス間の重複を決定するRepresentative RegionによるNMS（R2NMS）を導入する。
同じアンカーから全身ボックスと可視ボックスを予測するPaired RPN、Paired Feature Extractor、Paired R-CNNからなるPaired-Box Model (PBM)を開発する。
Paired Proposal Feature Extractor（PPFE）を介して全身/可視提案の特徴を融合し、全身と可視歩行者検出を改善する。
地上truthペアQ=(F,V)がアンカー割り当てと回帰を導くペアサンプルユニットで訓練する。
CrowdHumanとCityPersonsをMR、AP、Recall指標で評価し、AdaptiveNMSとRepulsion Lossと比較する。

実験結果

リサーチクエスチョン

RQ1ペアボックス表現を介して可視領域を活用することで、混雑した歩行者検出におけるNMSを改善できるか？
RQ2高遮蔽シナリオで偽陽性を増やさずに冗長検出を減らすか？
RQ3標準のFaster R-CNNベースラインに比べて、ペア全身/可視ボックスとPPFEはどれほどの性能向上をもたらすか？
RQ4PBM+R2NMSが全身と可視領域タスクを横断する最先端ベンチマーク（CrowdHuman, CityPersons）に与える影響は何か？

主な発見

手法	MR	AP	Recall
Baseline (FPN+ResNet-50)	50.42	84.95	90.24
Baseline* (re-implementation)	46.28	84.91	88.25
AdaptiveNMS	49.73	84.71	91.27
Repulsion Loss*	45.69	85.64	88.42
PBM (mask)	52.70	89.29	93.33

R2NMSはベースラインより検出指標（APとRecall）を改善し、混雑したシーンの処理能力が向上することを示している。
PBMとPPFEおよびR2NMSはCrowdHumanで最先端の結果を達成し、MRが50.42から43.35へ低下、APが89.29へ上昇、Recallが93.33へ。
CityPersonsでは、マスク付きPBMとR2NMSが合理的遮蔽と重遮蔽サブセットでMRを低減（例：Rを13.8から11.1へ、HOを59.0から53.3へ）
PPFEベースの特徴統合は、特に注意機構とともに、単純な結合より顕著な利得を生む。
R2NMSは設定全体でAPとRecallを強化する一方、MRの改善は可視予測の品質に依存する；全体として本手法は混雑検出に強みを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。