QUICK REVIEW

[论文解读] Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

Liang Zhu, Zhijian Zhao|arXiv (Cornell University)|Feb 4, 2019

Video Surveillance and Tracking Methods参考文献 28被引用 69

一句话总结

SFANet 引入了一个具有注意力机制的双路径多尺度融合网络，能够在不同密度下生成高分辨率密度图和准确的人群计数。它采用带有两个融合路径（密度图路径和注意力图路径）的 VGG16-bn 主干，端到端训练。

ABSTRACT

The task of crowd counting in varying density scenes is an extremely difficult challenge due to large scale variations. In this paper, we propose a novel dual path multi-scale fusion network architecture with attention mechanism named SFANet that can perform accurate count estimation as well as present high-resolution density maps for highly congested crowd scenes. The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multi-scale features as well as attention map to generate the final high-quality high-resolution density maps. SFANet can be easily trained in an end-to-end way by dual path joint training. We have evaluated our method on four crowd counting datasets (ShanghaiTech, UCF CC 50, UCSD and UCF-QRNF). The results demonstrate that with attention mechanism and multi-scale feature fusion, the proposed SFANet achieves the best performance on all these datasets and generates better quality density maps compared with other state-of-the-art approaches.

研究动机与目标

解决人群计数中的头部尺度变化大和背景噪声问题。
利用多尺度特征融合生成高分辨率密度图。
引入注意力路径以突出头部区域并抑制背景。
提出将欧几里得密度损失与注意力引导结合在一起的多任务损失。
在标准人群计数基准上展示优越的性能。

提出的方法

使用 VGG16-bn 主干提取多尺度特征（conv2-2、conv3-3、conv4-3、conv5-3）。
通过特征金字塔融合构建密度图路径（DMP），以产生高分辨率密度图。
构建具有相同结构的注意力图路径（AMP），以学习头部区域的概率。
通过逐元素相乘将 DMP 特征与注意力图融合，以细化密度特征。
使用多任务损失训练：L = L_density + alpha * L_attention（alpha = 0.1）。
通过高斯模糊头部标注来生成真实密度图；从密度图推导注意力真实值。

实验结果

研究问题

RQ1双路径多尺度融合网络是否能提升对尺度变化和背景噪声的鲁棒性？
RQ2整合注意力图路径是否能改善头部区域定位与密度图质量？
RQ3提出的多任务损失是否能加速收敛并提升计数精度？

主要发现

数据集	Part	MAE	MSE
ShanghaiTech	Part A	59.8	99.3
ShanghaiTech	Part B	6.9	10.9
UCF_CC_50	Full set	219.6	316.2
UCF-QRNF	Full set	100.8	174.5
UCSD	Full set	0.82	1.07

SFANet 在 ShanghaiTech、UCF_CC_50、UCF-QRNF 和 UCSD 数据集上实现了最先进或有竞争力的 MAE/MSE。
在 ShanghaiTech Part A，SFANet 获得 59.8 MAE 和 99.3 MSE；在 Part B，分别为 6.9 MAE 和 10.9 MSE（见表 1）。
在 UCF_CC_50 上，SFANet 实现 219.6 MAE 和 316.2 MSE。
在 UCF-QRNF 上，SFANet 达到 100.8 MAE 和 174.5 MSE。
在 UCSD 上，SFANet 实现 0.82 MAE 和 1.07 MSE（越低越好）。
消融实验表明注意力路径在性能上超越 VGG-DMP 基线，证实了其贡献。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。