QUICK REVIEW

[논문 리뷰] ThunderNet: Towards Real-time Generic Object Detection

Zheng Qin, Zeming Li|arXiv (Cornell University)|2019. 03. 28.

Advanced Neural Network Applications참고 문헌 31인용 수 43

한 줄 요약

ThunderNet은 모바일 기기에서 실시간 일반 객체 탐지를 위해 설계된 경량의 두 단계 검출기로, 맞춤형 경량 백본(SNet)과 컨텍스트 강화 모듈 및 공간 주의 모듈을 포함한 효율적 탐지 헤드를 갖추고 있으며, 낮은 FLOPs에서도 ARM 실시간 속도와 경쟁력 있는 정확도를 달성한다.

ABSTRACT

Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Our code and models are available at \url{https://github.com/qinzheng93/ThunderNet}.

연구 동기 및 목표

두 단계 검출기가 모바일 기기에서 실시간 성능을 달성할 수 있는지 조사한다.
객체 탐지에 특화된 경량 백본을 설계하고 이미지 분류에서의 전이보다는 객체 탐지에 맞춘 설계를 지향한다.
정확도와 계산 비용의 균형을 맞추기 위한 효율적인 탐지 헤드 구성 요소를 개발한다.
입력 해상도, 백본 용량, 탐지 헤드 설계를 연결하여 최적의 실시간 성능을 달성한다.

제안 방법

receptive 필드를 확장하기 위해 5×5 깊이별 컨볼루션으로 ShuffleNetV2를 수정하여 SNet 경량 백본을 제안한다.
계산량을 줄이면서 정확도를 보존하기 위해 RPN 및 RoI 헤드 구성요소를 축소(예: RPN의 5×5 깊이별 컨볼루션, 1×1 컨볼루션, 축소된 R-CNN fc 크기).
다중 스케일 로컬 및 글로벌 컨텍스트를 1×1 프로젝션과 업샘플링/브로드캐스트로 융합하는 컨텍스트 강화 모듈(CEM)을 도입한다.
1×1 변환을 통해 RPN으로부터 얻은 전경 신호를 사용하여 CEM 특성을 다시 가중하는 공간 주의 모듈(SAM)을 도입한다.
모바일 하드웨어에서 속도와 정확도를 최대화하기 위해 입력 해상도, 백본 및 탐지 헤드 간의 균형을 탐색한다.
동시 SGD, 다중 스케일 학습, Cross-GPU Batch Normalization, Soft-NMS를 사용한 엔드투엔드 훈련.

실험 결과

연구 질문

RQ1모바일 하드웨어에서 두 단계 검출기가 속도와 정확도 면에서 경량의 단일 단계 검출기보다 우수한가?
RQ2실시간 모바일 탐지에서 최적의 정확도-효율성 트레이드오프를 제공하는 백본 및 탐지 헤드 설계 선택은 무엇인가?
RQ3컨텍스트 및 공간 주의 메커니즘이 특징 표현과 위치 추정에 어떤 영향을 미치는가?
RQ4입력 해상도, 백본 용량, 탐지 헤드 복잡도 간의 ARM 플랫폼에 대한 최적 균형은 무엇인가?

주요 결과

모델	백본	입력	MFLOPs	AP	AP50	AP75
ThunderNet (ours)	SNet49	320×320	262	19.2	33.7	19.7
ThunderNet (ours)	SNet146	320×320	473	23.7	40.3	24.6
ThunderNet (ours)	SNet535	320×320	1300	28.1	46.2	29.6

SNet49를 사용하는 ThunderNet은 FLOPs의 약 22% 수준에서 MobileNet-SSD 수준의 정확도를 달성한다.
SNet146을 사용하는 ThunderNet은 약 40%의 FLOPs로 이전의 경량 검출기들을 능가한다.
SNet535를 사용하는 ThunderNet은 작은 FLOPs 비율로 대형 검출기와 경쟁한다.
COCO test-dev에서 SNet146을 사용하는 ThunderNet은 AP 23.7, AP50 40.3, AP75 24.6를 달성하며, SNet535로는 AP 28.1, AP50 46.2, AP75 29.6에 도달한다.
ARM에서 SNet49로 24.1 fps, ARM에서 SNet146으로 13.8 fps의 속도, 모든 변형은 GPU에서 200 fps 이상이다.
대형 백본-소형 헤드 설계가 유사 FLOPs에서 소형 백본-대형 헤드보다 더 높은 성능을 보이며, 백본-헤드 간의 호환성이 중요함을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.