QUICK REVIEW

[论文解读] Switching Convolutional Neural Network for Crowd Counting

Deepak Babu Sam, Shiv Surya|ePrints-IISc. (Indian Institute of Science Bangalore)|Aug 1, 2017

Video Surveillance and Tracking Methods参考文献 20被引用 99

一句话总结

Switch-CNN 将人群场景中的补丁切换到具有不同感受野的专用 CNN 回归器，通过开关分类器实现，在主要数据集上实现了最先进的人群计数性能。

ABSTRACT

We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence better performance compared to current state-of-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.

研究动机与目标

解决在人群计数中因尺度、透视和遮挡变化带来的挑战。
通过将补丁路由到专用回归器来利用图像内的局部密度变化。
开发一个端到端的 Switch-CNN 框架，包含差分、耦合和开关训练阶段。

提出的方法

使用具有不同感受野的三个 CNN 回归器来处理不同的群体规模。
将每张图像划分为 9 个补丁，并将每个补丁路由到最适合其密度的回归器。
训练一个开关分类器（基于 VGG-16 主干并带全局平均池化 GAP）以将补丁分配给回归器。
预训练回归器，应用差分训练以最大化每个补丁的计数精度，然后执行耦合训练以共同适应开关和回归器。
使用几何自适应核或固定扩散根据数据集特征生成真实密度图。
在标准人群计数基准上使用 MAE 和 MSE 进行评估。

实验结果

研究问题

RQ1在拥挤场景中，是否可以通过对不同感受野的回归器进行补丁级切换来改善密度定位和计数精度？
RQ2联合训练的开关分类器加上多样化回归器是否在密度和透视变化的数据集上优于单模型方法？
RQ3差分训练如何影响将图像补丁分区为基于密度的组以及随后的计数性能？

主要发现

Switch-CNN 在 ShanghaiTech Part A 和 Part B 上达到最先进的 MAE 和 MSE，超过 MCNN 及其他方法。
在 ShanghaiTech Part A 上，Switch-CNN 达到 MAE 90.4 和 MSE 135.0；在 Part B 上，MAE 21.6 和 MSE 33.4。
在 UCF_CC_50 上，Switch-CNN 实现 MAE 318.1 和 MSE 439.2，开关准确率 54.3%。
在 UCSD 上，Switch-CNN 报告 MAE 1.62 和 MSE 2.10，开关准确率 60.9%。
在 WorldExpo’10 上，Switch-CNN 在带透视图时平均 MAE 为 9.4，在不带透视图时为 11.2，优于若干基线方法。
差分训练将补丁按密度对齐为多叉分组，耦合训练进一步提升开关和回归器的鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。