QUICK REVIEW

[논문 리뷰] SSA-CNN: Semantic Self-Attention CNN for Pedestrian Detection

Chengju Zhou, Meiqing Wu|arXiv (Cornell University)|2019. 02. 25.

Advanced Neural Network Applications참고 문헌 42인용 수 35

한 줄 요약

SSA-CNN은 다중 스케일 시맨틱 세그멘트 맵을 셀프 어텐션 큐로 CNN 피처와 융합하여 보행자 탐지를 개선하고, 효율적인 추론으로 Caltech에서 MR 최상위 성능을 달성한다.

ABSTRACT

Pedestrian detection plays an important role in many applications such as autonomous driving. We propose a method that explores semantic segmentation results as self-attention cues to significantly improve the pedestrian detection performance. Specifically, a multi-task network is designed to jointly learn semantic segmentation and pedestrian detection from image datasets with weak box-wise annotations. The semantic segmentation feature maps are concatenated with corresponding convolution features maps to provide more discriminative features for pedestrian detection and pedestrian classification. By jointly learning segmentation and detection, our proposed pedestrian self-attention mechanism can effectively identify pedestrian regions and suppress backgrounds. In addition, we propose to incorporate semantic attention information from multi-scale layers into deep convolution neural network to boost pedestrian detection. Experiment results show that the proposed method achieves the best detection performance with MR of 6.27% on Caltech dataset and obtain competitive performance on CityPersons dataset while maintaining high computational efficiency.

연구 동기 및 목표

의미 체계 세그멘테이션을 셀프 어텐션 큐로 활용해 보행자 탐지의 성능 향상을 목표로 한다.
box-wise 주석을 이용하여 보행자 탐지와 시맨틱 세그멘테이션을 함께 학습하는 다중 스케일, 다중 작업 프레임워크를 제안한다.
RPN 및 R-CNN 단계에 시맨틱 피처를 통합하여 보행자의 판별 및 위치 추정 성능을 향상시킨다.

제안 방법

Faster R-CNN을 Semantic Self-Attention RPN(SSA-RPN) 및 Semantic Self-Attention R-CNN(SSA-RCNN)으로 확장한다.
conv4_3 및 conv5_3에 시맨틱 세그멘테이션 분기를 부착하여 conv4_3_seg 및 conv5_3_seg 특징 맵을 생성한다.
시맨틱 피처 맵을 대응하는 컨볼루션 피처와 연결(concatenate)하여 보강된 탐지/분류 피처를 형성한다.
R-CNN에서의 셀프 어텐션을 위해 conv4_3 및 conv5_3의 세그멘테이션 맵을 풀링하고 결합하여 다중 스케일 시맨틱 정보를 활용한다.
탐지 및 세그멘테이션 분기를 함께 최적화하는 다중 작업 손실로 학습한다(이진 보행자 대 비보행자).
Caltech 및 CityPersons에서 GTX 1080 Ti로 단일 이미지 추론으로 평가한다.

실험 결과

연구 질문

RQ1다중 스케일 시맨틱 세그멘테이션을 셀프 어텐션으로 도입하면 보행자 탐지 성능이 향상되는가?
RQ2box-wise 주석을 이용한 탐지 및 세그멘테이션의 결합 학습이 주석 부담을 줄이면서 정확도를 높일 수 있는가?
RQ3다중 스케일 시맨틱 셀프 어텐션이 보행자 탐지에서 RPN 제안 및 R-CNN 분류에 어떤 영향을 미치는가?
RQ4최신 방법들과 비교한 본 방법의 런타임 효율성은 어떤가?

주요 결과

SSA-CNN은 Caltech 테스트 세트의 Reasonable 설정에서 MR 6.27%를 달성하여 기존 방법을 능가한다.
높은 계산 효율성을 유지하면서 CityPersons에서 경쟁력 있는 결과를 보인다.
다중 스케일 시맨틱 셀프 어텐션은 단일 스케일 또는 어텐션 없는 베이스라인과 비교하여 제안 품질(SSA-RPN)과 분류(SSA-RCNN)를 모두 향상시킨다.
시맨틱 가이던스를 위한 box-wise 주석 사용은 픽셀 단위 세그멘테이션에 비해 주석 요구를 줄인다.
SSA-RPN–SSA-RCNN 통합은 SDS-RCNN 및 F-DNN2+SS와 같은 동시대 방법에 비해 더 빠르거나 비슷한 런타임을 제공합니다.
특성 연구는 더 깊은 conv5_3 시맨틱 맵이 더 강한 어텐션 큐를 제공하고 다중 스케일 융합이 최상의 성능을 낳음을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.