QUICK REVIEW

[论文解读] Shallow Feature Based Dense Attention Network for Crowd Counting

Yunqi Miao, Zijia Lin|arXiv (Cornell University)|Jun 17, 2020

Video Surveillance and Tracking Methods被引用 24

一句话总结

该论文提出SDANet，一种基于浅层特征的密集注意力网络，用于人群计数。通过利用浅层特征抑制背景噪声，并采用密集跳跃连接以保留多尺度人体特征，显著提升了模型精度。在UCF_CC_50数据集上，MAE降低了11.9%，展现出对尺度变化和杂乱背景的优越鲁棒性。

ABSTRACT

While the performance of crowd counting via deep learning has been improved dramatically in the recent years, it remains an ingrained problem due to cluttered backgrounds and varying scales of people within an image. In this paper, we propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images, which diminishes the impact of backgrounds via involving a shallow feature based attention model, and meanwhile, captures multi-scale information via densely connecting hierarchical image features. Specifically, inspired by the observation that backgrounds and human crowds generally have noticeably different responses in shallow features, we decide to build our attention model upon shallow-feature maps, which results in accurate background-pixel detection. Moreover, considering that the most representative features of people across different scales can appear in different layers of a feature extraction network, to better keep them all, we propose to densely connect hierarchical image features of different layers and subsequently encode them for estimating crowd density. Experimental results on three benchmark datasets clearly demonstrate the superiority of SDANet when dealing with different scenarios. Particularly, on the challenging UCF CC 50 dataset, our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.

研究动机与目标

为解决从静态图像中进行人群计数时长期存在的背景杂乱与尺度变化问题。
减少因雨伞、楼梯、建筑物等背景元素导致的密度估计中的误报。
在深层网络的不同层级中保留多尺度人体特征，以提升密度预测性能。
开发一种轻量级注意力机制，避免使用复杂且参数量大的独立模型。
通过密集连接来自多个网络层级的层次化特征，增强特征表示能力。

提出的方法

该方法采用注意力图生成器（AMG），利用浅层卷积特征图生成注意力权重，基于其独特的激活模式区分人群区域与背景区域。
AMG被集成到特征提取主干网络中，支持端到端训练，无需额外参数或独立分类器。
采用密集连接结构，融合所有前序层的特征，确保多尺度人体特征得以保留并有效编码。
网络采用从粗到精的优化策略，并结合多尺度损失函数 $ L_{map} $，以提升密度图预测的准确性。
注意力机制通过二元交叉熵损失 $ L_{att} $ 进行训练，以优化背景抑制的准确性。
来自多个层级的特征图被拼接，并通过优化层处理，生成最终的密度图。

实验结果

研究问题

RQ1浅层特征图能否有效区分人群计数中的群体区域与杂乱背景？
RQ2基于浅层特征的轻量级注意力机制是否在抑制背景噪声方面优于复杂的独立注意力模型？
RQ3在层次化特征中使用密集跳跃连接是否能提升人群计数的多尺度表征学习能力？
RQ4浅层注意力与密集特征融合的结合在具有不同人群密度的挑战性数据集上表现如何？
RQ5从粗到精的优化策略在多大程度上提升了密度图估计的准确性？

主要发现

在UCF_CC_50数据集上，SDANet将平均绝对误差（MAE）降低了11.9%，显著优于先前的最先进方法。
在WorldExpo’10数据集上，SDANet在Scene 1、Scene 4、Scene 5以及平均指标上均取得最佳表现，展现出在多样化真实场景中的强适应能力。
在ShanghaiTech Part-B数据集上，与最新方法TEDnet相比，SDANet将MAE降低4.87%，MSE降低20.31%。
消融实验表明，移除注意力模块后MAE增加37%，证明其在背景抑制中的关键作用。
移除密集连接结构后计数准确率下降20.1%，证实其在保留多尺度特征方面的重要性。
移除优化层后MAE增加16%，验证了从粗到精训练策略的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。