QUICK REVIEW

[論文レビュー] DETR for Crowd Pedestrian Detection

Matthieu Lin, Chuming Li|arXiv (Cornell University)|Dec 12, 2020

Advanced Neural Network Applications参考文献 39被引用数 38

ひとこと要約

PED-DETR は群衆での DETR のパフォーマンス低下の原因を分析し、DQRF、V-Match、Fast-KM を導入して CrowdHuman と CityPersons で最先端の結果を達成する歩行者 End-to-End 検出器です。

ABSTRACT

Pedestrian detection in crowd scenes poses a challenging problem due to the heuristic defined mapping from anchors to pedestrians and the conflict between NMS and highly overlapped pedestrians. The recently proposed end-to-end detectors(ED), DETR and deformable DETR, replace hand designed components such as NMS and anchors using the transformer architecture, which gets rid of duplicate predictions by computing all pairwise interactions between queries. Inspired by these works, we explore their performance on crowd pedestrian detection. Surprisingly, compared to Faster-RCNN with FPN, the results are opposite to those obtained on COCO. Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes. In this work, we identify the underlying motives driving ED's poor performance and propose a new decoder to address them. Moreover, we design a mechanism to leverage the less occluded visible parts of pedestrian specifically for ED, and achieve further improvements. A faster bipartite match algorithm is also introduced to make ED training on crowd dataset more practical. The proposed detector PED(Pedestrian End-to-end Detector) outperforms both previous EDs and the baseline Faster-RCNN on CityPersons and CrowdHuman. It also achieves comparable performance with state-of-the-art pedestrian detection methods. Code will be released soon.

研究の動機と目的

Faster-RCNN with FPN と比較して、DETR と deformable DETR が混雑した歩行者検出でなぜ性能を低下さるのかを評価する。
DETR の歩行者検出性能を向上させる Dense Query and Rectified Attention Field decoder (DQRF) を提案する。
V-Match を用いて可視領域のアノテーションを活用し、DETR ベースの歩行者検出を強化する。
混雑データセットでの DETR の学習を実現可能にするため、より高速な二部最適一致 (Fast-KM) を導入する。
CrowdHuman および CityPersons ベンチマークで最先端または競争力のある性能を示す。

提案手法

混雑した歩行者シーンにおける DETR のデコーダ挙動を分析し、失敗モードを特定する。
Dense Queries と Rectified Attention Field を実現する DQRF デコーダを開発して、dense queries と広い注意領域を有効にする。
RF (Rectified Attention Field) を導入し、デコーダ層全体のクロスアテンションを安定化させる。
追加コストなしで、層を跨いで全領域と可視領域の監督を行う V-Match を提案する。
学習中のハンガリー法によるマッチングステップを加速する Fast-KM を実装する。

実験結果

リサーチクエスチョン

RQ1元の DETR や deformable DETR が Faster-RCNN と比較して混雑した歩行者検出でなぜ性能が低いのか？
RQ2Dense Query and Rectified Attention Field デコーダは、混雑した歩行者の検出を改善できるだろうか？
RQ3V-Match を介して可視領域アノテーションを活用することで、追加コストなしに End-to-End の歩行者検出を改善できるか？
RQ4Fast-KM で精度を犠牲にすることなく、二部マッチングの訓練をどれだけ高速化できるのか？

主な発見

Model	Epochs	GPU days	AP	MR-2
Faster-RCNN	20	0.75	85.0	50.4
DETR	300	223.7	66.12	80.62
+Deformable	50	8.4	86.74	53.98

PED は CrowdHuman および CityPersons で deformable DETR および Faster-RCNN のベースラインを上回る改善を達成する。
提案された DQRF デコーダは、混雑したシーンにおける DETR ベース検出器と Faster-RCNN のギャップを大幅に縮める。
V-Match は追加コストゼロで可視領域監督の利得を提供する。
Fast-KM は訓練中の二部マッチングを最大10倍の高速化をもたらす。
PED は難易度の高いベンチマークで最先端手法と競合する結果を達成する。
Dense queries と rectified attention fields は、混雑した遮蔽環境での誤検出を減らす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。