QUICK REVIEW

[논문 리뷰] Precise Single-stage Detector

Aisha Chandio, Gong Gui|arXiv (Cornell University)|2022. 10. 09.

Advanced Image and Video Retrieval Techniques인용 수 27

한 줄 요약

이 논문은 PSSD를 제시합니다. 이는 특징 풍부함을 확장하는 추가 계층, 수용 영역 확장 모듈, 양방향 FPN, 그리고 IOU-guided 손실을 통해 정확도를 향상시키면서도 실시간 속도를 유지하는 수정된 SSD입니다.

ABSTRACT

There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy.

연구 동기 및 목표

SSD 관련 단일 스테이지 탐지기가 로컬 상세 정보를 보존하고 박스 회귀와 분류를 정렬하는 데 있는 한계를 극복한다.
대형 백본 변경 없이 멀티 스케일 특징 표현을 풍부하게 한다.
IOU-guided 손실 및 예측 구조를 도입하여 NMS 필터링 및 로컬라이제이션 정확도를 향상시킨다.

제안 방법

예측기가 사용하는 기본 특징 맵을 확장하기 위해 SSD에 추가 계층을 더한다.
수용 영역 확장 모듈(RFM)과 양방향 FPN으로 구성된 특징 강화 모듈(FEM)을 도입하여 다양한 스케일에서 로컬 및 의미 정보를 풍부하게 한다.
매개변수 오버헤드가 크지 않으면서도 균일한 수용 영역 분포를 개선하도록 백본을 재설계한다.
R_IOU 손실과 CEJI 손실을 포함한 IOU-가이드 예측 구조를 제안하여 분류와 로컬라이제이션의 정렬을 개선하고 NMS 중 고품질 박스에 집중하도록 한다.

실험 결과

연구 질문

RQ1SSD 스타일의 단일 스테이지 탐지기가 더 나은 속도-정확도 트레이드를 어떻게 달성할 수 있는가? 더 깊은 백본으로 대체하지 않고도 가능할까?
RQ2IOU-가이드 방식이 단일 스테이지 탐지기의 분류 점수와 로컬라이제이션 정확도 간의 정렬을 개선할 수 있는가?
RQ3양방향 특징 피라미드와 수용 영역 확장으로 단일 스테이지 프레임워크에서 작은 객체와 큰 객체 탐지가 개선되는가?

주요 결과

Method	Backbone	Input size	FPS	AP	AP50	AP75	AP_small	AP_medium	AP_large
PSSD320	VGG16	320×320	45	33.8	52.2	35.8	14.8	38.5	50.3
PSSD512	VGG16	512×512	27	37.2	55.9	40.3	18.7	41.6	51.4

PSSD320은 VGG16 백본과 320×320 입력에서 MS COCO 2017 test-dev에서 33.8 mAP를 45 FPS로 달성했다.
PSSD512은 VGG16 백본과 512×512 입력에서 MS COCO 2017 test-dev에서 37.2 mAP를 27 FPS로 달성했다.
Pascal VOC 2007에서 PSSD320은 66 FPS로 81.28 mAP를, PSSD512는 40 FPS로 82.82 mAP를 달성했다.
ablation 연구에서 Two-way FPN과 RFM 및 IOU-guided 예측을 함께 도입하면 SSD 기준선의 AP 25.8에서 33.8(PSSD320)로 증가했다.
IOU-guided 예측과 새로운 손실 항(R_IOU 손실 및 CEJI 손실)은 기준선 대비 측정 가능한 이점을 제공하고 높은 점수의 저 IOU 예측을 감소시켰다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.