QUICK REVIEW

[论文解读] CrowdNet: A Deep Convolutional Network for Dense Crowd Counting

Lokesh Boominathan, Srinivas S S Kruthiventi|arXiv (Cornell University)|Aug 22, 2016

Video Surveillance and Tracking Methods参考文献 17被引用 122

一句话总结

CrowdNet 将深层和浅层全卷积网络结合起来，从图像预测密集人群密度图，使用多尺度数据增强来处理尺度变化，并在 UCF_CC_50 上实现了最优 MAE。

ABSTRACT

Our work proposes a novel deep learning framework for estimating crowd density from static images of highly dense crowds. We use a combination of deep and shallow, fully convolutional networks to predict the density map for a given crowd image. Such a combination is used for effectively capturing both the high-level semantic information (face/body detectors) and the low-level features (blob detectors), that are necessary for crowd counting under large scale variations. As most crowd datasets have limited training samples (<100 images) and deep learning based approaches require large amounts of training data, we perform multi-scale data augmentation. Augmenting the training samples in such a manner helps in guiding the CNN to learn scale invariant representations. Our method is tested on the challenging UCF_CC_50 dataset, and shown to outperform the state of the art methods.

研究动机与目标

在高度密集的场景中利用静态图像推动准确的人群密度估计。
开发一个同时利用高层语义线索和低层斑块模式的网络。
通过多尺度数据扩增来解决训练数据有限的问题。
生成密集的密度图和总人群数量以用于分析和安全应用。

提出的方法

使用一个深度 CNN（类似 VGG-16），去掉全连接层，在1/8 分辨率下进行逐像素密度预测。
使用一个浅层3层 CNN 来检测小头部斑点并补充深层特征。
通过1x1卷积将深层和浅层预测拼接，并上采样到输入尺寸以获得最终密度图。
使用高斯模糊的头部注释所生成的地面实况进行训练，以保持总计数。
使用多尺度图像补丁进行训练增强（尺度从0.5到1.2），并对高密度补丁进行过采样，以应对尺度变化和人群密度。

实验结果

研究问题

RQ1混合深层+浅层 CNN 在极密集场景中是否能准确预测人群密度图？
RQ2多尺度数据增强是否提升对尺度变化和遮挡在拥挤计数中的鲁棒性？
RQ3生成的高斯地面真值在训练逐像素密度估计模型方面是否有效？
RQ4将深层和浅层特征结合对计数精度的影响是什么？

主要发现

在 UCF_CC_50 上实现了最先进的 MAE（452.5），优于此前方法。
结合深层和浅层网络的 MAE (645) 优于任一单独网络（深层：681，浅层：1107）。
针对密集区域的数据增强使训练补丁数量几乎翻倍（从 26,385 增至 50,891），并将 MAE 从 725 降至 645。
该模型在大多数图像中能估计接近实际的计数，尽管在极密集情况（>2500 人）会低估。
密度图和总计数通过对预测的密度求和获得，模型使用 L2 损失训练。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。