QUICK REVIEW

[論文レビュー] Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

Haroon Idrees, Muhmmad Tayyab|arXiv (Cornell University)|Aug 2, 2018

Video Surveillance and Tracking Methods参考文献 22被引用数 61

ひとこと要約

Composition Lossを導入し、CNNをカウント、密度推定、 Localizationを共同訓練し、dense crowdsで大規模UCF-QNRFデータセットを公開; カウント、密度、Localizationタスクで最先端の結果を示す。

ABSTRACT

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision. In particular, counting in highly dense crowds is a challenging problem with far-reaching applicability in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this paper, we propose a novel approach that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image. Our formulation is based on an important observation that the three problems are inherently related to each other making the loss function for optimizing a deep CNN decomposable. Since localization requires high-quality images and annotations, we introduce UCF-QNRF dataset that overcomes the shortcomings of previous datasets, and contains 1.25 million humans manually marked with dot annotations. Finally, we present evaluation measures and comparison with recent deep CNN networks, including those developed specifically for crowd counting. Our approach significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.

研究の動機と目的

非常に密集した群衆における正確なカウントを安全保障用途のために動機づける。
カウント、密度推定、Localizationの損失を分解して共同学習フレームワークを提案する。
dense crowds向けに高品質で大規模なデータセット（UCF-QNRF）を作成・注釈付けする。
密度とLocalizationの監視が多様なシーンでカウント性能を改善することを示す。

提案手法

Adaptive Gaussianカーネルを用いてカウント、密度マップ、Localizationを結びつける分解可能なComposition Lossを定義する。
DenseNetベースからDensity Networkを分岐させ、複数の密度レベル（D1, D2）とLocalizationマップ（Dinfty）を出力する。
個人ごとの適応帯域 sigma_i = min( nearest neighborまでの距離, tau) で密度を計算し、f_k(sigma) = sigma^{1/k}の関数で密度マップ列D_kを生成する。
L_c（カウント回帰）とL_k（予測密度/LocalizationマップとグラウンドトゥルースのMSE）を複数の密度レベルで用い、カウントが真のカウントと一致するように強制する。
DenseNet-201をバックボーンとして用い、DenseBlock2にDensity Networkブロックを接続して、D1, D2, Dinftyを中間監督付きで予測する。

実験結果

リサーチクエスチョン

RQ1カウント、密度推定、Localizationを損失の効果を低下させることなく jointly trainedできますか？
RQ2適応カーネルを用いた複数の密度レベルを組み合わせることでLocalizationの精度と密度マップの品質は向上するか？
RQ3Composition Lossがカウント精度に与える影響は、単一タスクまたはマルチタスクのベースラインと比較してどうか？
RQ4提案された大規模データセット UCF-QNRF は dense crowd分析の一般化を促進するか？

主な発見

手法	C-MAE	C-NAE	C-MSE
Idrees et al. [12]	315	0.63	508
MCNN [30]	277	0.55	426
Encoder-Decoder [3]	270	0.56	478
CMTL [25]	252	0.54	514
SwitchCNN [24]	228	0.44	445
Resnet101 [8]	190	0.50	277
Densenet201 [10]	163	0.40	226
Proposed	132	0.26	191

提案手法はUCF-QNRFデータセットでカウントMAE 132、NAE 0.258、MSE 191を達成し、いくつかの最先端手法を上回る。
密度マップ推定は提案LossでDM-MAE 0.00044、DM-MSE 0.0017、DM-HI 0.9131を達成し、競合手法を大きく上回る。
Localizationの結果は提案手法が平均精度75.8%、平均再現率59.75%、L-AUC 0.714を達成し、いくつかのベースラインより高い。
アブレーション研究は複数密度レベル（D1、D2、Dinfty）とComposition Lossが単一ブランチや非組成的構成よりカウント、密度、Localization指標を一貫して改善することを確認。
密度とLocalizationマップからの中間監督は学習収束を速め、タスク全体の性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。