QUICK REVIEW

[论文解读] Crowd Counting with Deep Structured Scale Integration Network

Lingbo Liu, Zhilin Qiu|arXiv (Cornell University)|Aug 23, 2019

Video Surveillance and Tracking Methods参考文献 45被引用 36

一句话总结

DSSINet 引入基于 CRF 的结构化特征增强模块以相互 refined multiscale crowd features 和一个 Dilated Multiscale Structural Similarity 损失来强制局部尺度一致性，在多个基准上实现现有方法之上的结果。

ABSTRACT

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people. In this paper, we propose a novel Deep Structured Scale Integration Network (DSSINet) for crowd counting, which addresses the scale variation of people by using structured feature representation learning and hierarchically structured loss function optimization. Unlike conventional methods which directly fuse multiple features with weighted average or concatenation, we first introduce a Structured Feature Enhancement Module based on conditional random fields (CRFs) to refine multiscale features mutually with a message passing mechanism. In this module, each scale-specific feature is considered as a continuous random variable and passes complementary information to refine the features at other scales. Second, we utilize a Dilated Multiscale Structural Similarity loss to enforce our DSSINet to learn the local correlation of people's scales within regions of various size, thus yielding high-quality density maps. Extensive experiments on four challenging benchmarks well demonstrate the effectiveness of our method. Specifically, our DSSINet achieves improvements of 9.5% error reduction on Shanghaitech dataset and 24.9% on UCF-QNRF dataset against the state-of-the-art methods.

研究动机与目标

通过学习鲁棒的多尺度特征表示来应对人群场景中的显著尺度变化。
利用基于 CRF 的特征细化机制在尺度之间进行结构化信息共享。
使用扩张卷积的 MS-SSIM 公式对不同大小区域的局部尺度相关性进行建模以施加损失。
通过端到端的侧输出与共享参数子网络的自顶向下融合来生成高质量的密度图。

提出的方法

使用三个并行子网络处理同一图像的不同缩放版本，参数共享。
引入基于条件随机场的结构化特征增强模块（SFEM），通过信息传递方案相互 refining 多尺度特征。
从 refined 特征生成多个侧输出密度图并自顶向下融合以获得高分辨率密度图。
提出一个扩张多尺度结构相似性（DMS-SSIM）损失，使用固定高斯核和扩张卷积在不同大小的区域上测量 SSIM。
用 DMS-SSIM 损失进行优化，以在不同尺度之间强制局部尺度相关性和密度图的一致性。

实验结果

研究问题

RQ1如何 refin 多尺度特征以更好地处理拥挤场景中的极端尺度变化？
RQ2基于 CRF 的尺度特征之间的相互细化是否能提升对尺度变化的鲁棒性？
RQ3扩张的 MS-SSIM 损失是否比传统损失更好地捕捉局部尺度相关性用于人群计数？
RQ4所提出的 DSSINet 架构在标准人群计数基准上的有效性与效率如何？

主要发现

Dataset/Scenario	MAE	MSE
Shanghaitech Part A (Ours)	60.63	96.04
Shanghaitech Part B (Ours)	6.85	10.34
UCF-QNRF (Ours)	99.1	159.2
UCF_CC_50 (Ours)	216.9	302.4
WorldExpo’10 Ave (Ours)	6.67	6.67

DSSINet 在多个基准上实现了最先进的性能，尤其是在 Shanghaitech Part A 上相对于先前方法实现 9.5% 的 MAE 下降，在 UCF-QNRF 上实现 24.9% 的 MAE 下降（相对值来自论文）。
SFEM（基于 CRF 的特征细化）显著提升了多尺度特征的鲁棒性，相较于简单融合策略。
带扩张的 DMS-SSIM 损失（m=5 被显示为最佳）在 MAE/MSE 上达到最佳，优于欧氏距离和基于 SSIM 的损失。
该模型拥有 8.858 百万参数，在 1080 GPU 上对 720x576 帧的处理约为 450 ms，主干网络占大部分参数，提供有利的准确性/复杂度权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。