QUICK REVIEW

[论文解读] Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

Haroon Idrees, Muhmmad Tayyab|arXiv (Cornell University)|Aug 2, 2018

Video Surveillance and Tracking Methods参考文献 22被引用 61

一句话总结

提出 Composition Loss，联同训练一个用于在人群密集场景中的计数、密度图估计与定位的卷积神经网络，并发布大型数据集 UCF-QNRF；在计数、密度和定位任务上显示出最先进的结果。

ABSTRACT

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision. In particular, counting in highly dense crowds is a challenging problem with far-reaching applicability in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this paper, we propose a novel approach that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image. Our formulation is based on an important observation that the three problems are inherently related to each other making the loss function for optimizing a deep CNN decomposable. Since localization requires high-quality images and annotations, we introduce UCF-QNRF dataset that overcomes the shortcomings of previous datasets, and contains 1.25 million humans manually marked with dot annotations. Finally, we present evaluation measures and comparison with recent deep CNN networks, including those developed specifically for crowd counting. Our approach significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.

研究动机与目标

在极度密集的人群中实现准确计数以提升安全与安保应用的动机。
提出一种联合学习框架，将计数、密度估计和定位损失分解开来。
创建并标注一个面向密集人群的大规模高质量数据集（UCF-QNRF）。
证明密度和定位监督能在多样化场景中提升计数性能。

提出的方法

定义一个可分解的 Composition Loss，通过自适应高斯核将计数、密度图和定位联系起来。
从 DenseNet 基础架构分支出一个 Density Network，以输出多个密度级别 (D1、D2) 和一个定位图 (Dinfty)。
使用每个人的自适应带宽 sigma_i = min(到最近邻的距离, tau) 来计算密度，并生成一系列密度图 D_k，其中 f_k(sigma) = sigma^{1/k}。
通过损失 L_c（计数回归）和 L_k（预测与真实密度/定位图之间的均方误差）在多个密度水平上进行训练，强制使它们的计数与真实计数一致。
以 DenseNet-201 作为骨干网络，并将 Density Network 块附加到 DenseBlock2 上，以预测 D1、D2 和 Dinfty，并提供中间监督。

实验结果

研究问题

RQ1计数、密度估计和定位是否能够在不降低性能的情况下被联合训练？
RQ2将多密度级别与自适应核组合是否能提升定位精度和密度图质量？
RQ3与单任务或多任务基线相比，Composition Loss 对计数精度的影响是什么？
RQ4所提出的大规模 UCF-QNRF 数据集是否能提升对密集人群分析的泛化能力？

主要发现

Method	C-MAE	C-NAE	C-MSE
Idrees et al. [12]	315	0.63	508
MCNN [30]	277	0.55	426
Encoder-Decoder [3]	270	0.56	478
CMTL [25]	252	0.54	514
SwitchCNN [24]	228	0.44	445
Resnet101 [8]	190	0.50	277
Densenet201 [10]	163	0.40	226
Proposed	132	0.26	191

所提出的方法在 UCF-QNRF 数据集上实现了计数 MAE=132、NAE=0.258、MSE=191，超越了若干最前沿方法。
使用所提出损失的密度图估计获得 DM-MAE=0.00044、DM-MSE=0.0017、DM-HI=0.9131，显著优于竞争方法。
定位结果显示所提出方法的平均精确度 75.8%、平均召回率 59.75%、L-AUC 0.714，优于若干基线。
消融研究表明，多密度级别 (D1、D2、Dinfty) 和 Composition Loss 能持续提升计数、密度和定位指标，相较于单分支或非组合配置。
利用来自密度和定位图的中间监督有助于训练更快收敛，并提升各任务的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。