QUICK REVIEW

[논문 리뷰] Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

Liang Zhu, Zhijian Zhao|arXiv (Cornell University)|2019. 02. 04.

Video Surveillance and Tracking Methods참고 문헌 28인용 수 69

한 줄 요약

SFANet은 주의 메커니즘이 있는 듀얼 경로 다중 스케일 융합 네트워크를 도입하여 다양한 밀도에서 고해상도 밀도 맵과 정확한 군중 수를 생성합니다. VGG16-bn 백본을 사용하고 두 개의 융합 경로(밀도 맵 경로와 주의 맵 경로)를 엔드투엔드로 학습합니다.

ABSTRACT

The task of crowd counting in varying density scenes is an extremely difficult challenge due to large scale variations. In this paper, we propose a novel dual path multi-scale fusion network architecture with attention mechanism named SFANet that can perform accurate count estimation as well as present high-resolution density maps for highly congested crowd scenes. The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multi-scale features as well as attention map to generate the final high-quality high-resolution density maps. SFANet can be easily trained in an end-to-end way by dual path joint training. We have evaluated our method on four crowd counting datasets (ShanghaiTech, UCF CC 50, UCSD and UCF-QRNF). The results demonstrate that with attention mechanism and multi-scale feature fusion, the proposed SFANet achieves the best performance on all these datasets and generates better quality density maps compared with other state-of-the-art approaches.

연구 동기 및 목표

군중 카운팅에서 큰 머리 규모 변화와 배경 노이즈를 다룬다.
다중 스케일 특징 융합을 활용해 고해상도 밀도 맵을 생성한다.
머리 영역을 강조하고 배경을 억제하는 주의 경로를 도입한다.
Euclidean 밀도 손실과 주의 안내를 결합한 다중 작업 손실을 제안한다.
표준 군중 카운트 벤치마크에서 우수한 성능을 시연한다.

제안 방법

VGG16-bn 백본을 사용해 다중 스케일 특징을 추출한다 (conv2-2, conv3-3, conv4-3, conv5-3).
피처 피라미드 융합을 통해 고해상도 밀도 맵을 생성하는 밀도 맵 경로(DMP)를 구성한다.
동일한 구조로 머리 영역 확률을 학습하기 위한 주의 맵 경로(AMP)를 구성한다.
원소별 곱셈으로 DMP 특징과 주의 맵을 융합하여 밀도 특징을 다듬는다.
다중 작업 손실로 학습: L = L_density + alpha * L_attention (alpha = 0.1).
머리 주석을 가우시안 블러로 처리하여 실제 밀도 맵을 생성하고; 주의-정답은 밀도 맵으로부터 유도한다.

실험 결과

연구 질문

RQ1듀얼 경로, 다중 스케일 융합 네트워크가 규모 변화와 배경 노이즈에 대한 강인성을 향상시킬 수 있는가?
RQ2주의 맵 경로의 통합이 머리 영역의 위치 정확도와 밀도 맵 품질을 향상시키는가?
RQ3제안된 다중 작업 손실이 수렴 속도를 가속하고 군중 수 추정 정확도를 높이는가?

주요 결과

Dataset	Part	MAE	MSE
ShanghaiTech	Part A	59.8	99.3
ShanghaiTech	Part B	6.9	10.9
UCF_CC_50	Full set	219.6	316.2
UCF-QRNF	Full set	100.8	174.5
UCSD	Full set	0.82	1.07

SFANet은 ShanghaiTech, UCF_CC_50, UCF-QRNF, UCSD 데이터셋에서 최첨단 또는 경쟁력 있는 MAE/MSE를 달성한다.
ShanghaiTech Part A에서 59.8 MAE 및 99.3 MSE를 달성; Part B에서 6.9 MAE 및 10.9 MSE (표 1).
On UCF_CC_50, SFANet achieves 219.6 MAE and 316.2 MSE.
On UCF-QRNF, SFANet attains 100.8 MAE and 174.5 MSE.
On UCSD, SFANet achieves 0.82 MAE and 1.07 MSE (lower is better).
Ablation shows the attention path improves performance beyond VGG-DMP baselines, confirming its contribution.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.