QUICK REVIEW

[论文解读] Learning Independent Instance Maps for Crowd Localization

Junyu Gao, Tao Han|arXiv (Cornell University)|Dec 8, 2020

Video Surveillance and Tracking Methods参考文献 59被引用 33

一句话总结

该论文提出独立实例映射分割 (IIM)，并配备可微分二值化模块，用于在拥挤人群中定位单个头部，在 NWPU-Crowd Localization 上达到最先进结果，并在多个数据集上表现出色。

ABSTRACT

Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional density-based methods only predict coarse prediction, and segmentation/detection-based methods cannot handle extremely dense scenes and large-range scale-variations crowds. To this end, we propose an end-to-end and straightforward framework for crowd localization, named Independent Instance Map segmentation (IIM). Different from density maps and boxes regression, each instance in IIM is non-overlapped. By segmenting crowds into independent connected components, the positions and the crowd counts (the centers and the number of components, respectively) are obtained. Furthermore, to improve the segmentation quality for different density regions, we present a differentiable Binarization Module (BM) to output structured instance maps. BM brings two advantages into localization models: 1) adaptively learn a threshold map for different images to detect each instance more accurately; 2) directly train the model using loss on binary predictions and labels. Extensive experiments verify the proposed method is effective and outperforms the-state-of-the-art methods on the five popular crowd datasets. Significantly, IIM improves F1-measure by 10.4% on the NWPU-Crowd Localization task. The source code and pre-trained models will be released at https://github.com/taohan10200/IIM.

研究动机与目标

在极度密集的人群中推动头部定位的准确性，超越密度估计或框框方法。
提出 Independent Instance Maps (IIM)，其中每个实例互不重叠且可通过连通组件提取。
引入可微分二值化模块（BM），以生成结构化的实例映射。
结合像素级二值化模块，以针对每个像素区域自适应阈值，从而对尺度变化具有鲁棒性。
在标准人群数据集上展示出色的定位能力和具有竞争力的计数性能。

提出的方法

将人群区域表示为置信度图，并将其分割为独立的连通组件，以获得头部中心和计数。
引入一个可微分二值化层，将置信度图转换为二值化实例映射，而无需额外监督。
嵌入阈值编码器，生成引导二值化的图像级或像素级阈值。
使用像素级二值化模块（PBM）生成逐像素阈值，以适应尺度变化和空间分布。
通过对置信度图的回归损失和阈值图的L1损失的组合进行训练，并控制梯度流以在组件之间平衡反向传播。
通过检测4连通组件并在独立实例中提取中心来输出定位。

实验结果

研究问题

RQ1独立、互不重叠的实例映射是否能在极度密集的人群中提高定位精度，相较于基于密度或检测的方法？
RQ2可微分二值化层是否能够实现端到端优化，并在微小或遮挡的头部上获得更好的边界分割？
RQ3在大尺度变化下，图像级和像素级阈值学习策略是否能改善定位和计数？
RQ4将阈值编码器与置信度预测器集成，是否能适应不同密度的人群，从而提高定位鲁棒性？

主要发现

IIM 在 NWPU-Crowd 上实现了最先进的定位，在定位基准中排名第一，测试集的 F1-m = 76.2% 与 MAE = 87.1（见 Table II）。
该方法在 NWPU-Crowd Localization 上将 F1-度量相比先前方法提升约 9.0%。
像素级阈值学习（PBM）比图像级阈值化（IBM）提供更精确的定位。
IIM 在多个数据集上实现了强劲的性能，展现对负样本和密集人群的鲁棒性。
该方法在定位任务上实现更高的精确度和具竞争力的召回率，超越 ShanghaiTech Part A/B、UCF-QNRF、FDST 的若干检测和密度基方法（如表 IV 及相关结果所汇总）。
消融研究表明，相较于固定阈值，IBM/PBM 能带来显著提升，并显示将 L1 损失和来自定位目标的梯度流纳入的有益影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。