QUICK REVIEW

[논문 리뷰] PP-YOLO: An Effective and Efficient Implementation of Object Detector

Xiang Long, Kaipeng Deng|arXiv (Cornell University)|2020. 07. 23.

Advanced Neural Network Applications참고 문헌 47인용 수 234

한 줄 요약

PP-YOLO는 YOLOv3 기반 탐지기에 일련의 트릭을 적용하여 모델 크기나 FLOPs를 크게 증가시키지 않으면서 mAP를 상당히 향상시키고, 추론 속도를 유지하며 COCO AP 45.2%를 72.9 FPS에서 달성합니다.

ABSTRACT

Object detection is one of the most important areas in computer vision, which plays a key role in various practical scenarios. Due to limitation of hardware, it is often necessary to sacrifice accuracy to ensure the infer speed of the detector in practice. Therefore, the balance between effectiveness and efficiency of object detector must be considered. The goal of this paper is to implement an object detector with relatively balanced effectiveness and efficiency that can be directly applied in actual application scenarios, rather than propose a novel detection model. Considering that YOLOv3 has been widely used in practice, we develop a new object detector based on YOLOv3. We mainly try to combine various existing tricks that almost not increase the number of model parameters and FLOPs, to achieve the goal of improving the accuracy of detector as much as possible while ensuring that the speed is almost unchanged. Since all experiments in this paper are conducted based on PaddlePaddle, we call it PP-YOLO. By combining multiple tricks, PP-YOLO can achieve a better balance between effectiveness (45.2% mAP) and efficiency (72.9 FPS), surpassing the existing state-of-the-art detectors such as EfficientDet and YOLOv4.Source code is at https://github.com/PaddlePaddle/PaddleDetection.

연구 동기 및 목표

실세계 배치에서 정확도와 속도 간의 균형을 맞춘 실용적 객체 탐지기를 제안한다.
매개변수나 FLOPs를 크게 증가시키지 않으면서 기존의 트릭을 활용하여 탐지 성능을 개선한다.
백본이나 NAS 변경 없이 YOLOv3를 기반으로 한 더 나은 탐지기를 구성하는 레시피형 가이드를 제공한다.

제안 방법

YOLOv3 백본을 ResNet50-vd-dcn으로 교체하여 더 강력한 베이스라인(ResNet50-vd-dcn 백본)을 만든다.
EMA, DropBlock, IoU 손실, IoU 인식, Grid Sensitive, Matrix NMS, CoordConv, SPP, 더 나은 프리트레이닝 등 기존 트릭을 순차적으로 추가하되 효율성을 보존하도록 신중히 통합한다.
배포 실용성을 위해 PaddlePaddle 구현을 사용하고 YOLOv3와 유사한 백본/FPN/헤드 구조를 유지한다.
훈련을 COCO mAP 평가에 맞추도록 IoU 인식 분기와 기본 IoU 손실을 추가한다.
고급 후처리(Matrix NMS)와 좌표 개선(Grid Sensitive, CoordConv)를 적용하여 중대한 비용 없이 로컬라이제이션을 향상시킨다.
더 큰 배치 크기와 EMA를 실험하여 훈련을 안정시키고 최종 정확도를 높인다.

실험 결과

연구 질문

RQ1트루 파생 트릭들을 조합하여 모델 크기나 FLOPs를 늘리지 않고 COCO에서 mAP를 크게 향상시킬 수 있는가?
RQ2PaddlePaddle 프레임워크에서 YOLOv3 기반 탐지기에 적용했을 때 어떤 트릭이 정확도 향상에 가장 기여하는가?
RQ3PP-YOLO는 속도와 정확도 측면에서 COCO 평가에서 최첨단 탐지기(EfficientDet, YOLOv4 등)와 어떻게 비교되는가?
RQ4최종 탐지 성능에 대한 다양한 프리트레이닝 전략의 영향은 무엇인가?

주요 결과

Method	Backbone	Size	FPS (V100)	AP	AP50	AP75	APs	APm	APl	Notes
A	Darknet53 YOLOv3	640	-	38.9	-	-	-	-	-	Baseline YOLOv3 with Darknet53
B	ResNet50-vd-dcn YOLOv3	640	79.2	39.1	-	-	-	-	-	Baseline with ResNet50-vd-dcn backbone
C	B + LB + EMA + DropBlock	640	79.2	41.4	-	-	-	-	-	Baseline + training enhancements
D	C + IoU Loss	640	79.2	41.9	-	-	-	-	-	Add IoU loss branch
E	D + IoU Aware	640	74.9	42.5	-	-	-	-	-	IoU aware branch added
F	E + Grid Sensitive	640	74.8	42.8	-	-	-	-	-	Grid center decoding tweak
G	F + Matrix NMS	640	74.8	43.5	-	-	-	-	-	Replace NMS with Matrix NMS
H	G + CoordConv	640	74.1	44.0	-	-	-	-	-	Add CoordConv to some layers
I	H + SPP	640	72.9	44.3	-	-	-	-	-	Add Spatial Pyramid Pooling
J	I + Better ImageNet Pretrain	640	72.9	44.6	-	-	-	-	-	Distilled ResNet50-vd pretraining

YOLOv3 백본을 ResNet50-vd-dcn으로 교체하면 DarkNet-53보다 FLOPs가 현저히 낮고 추론 속도가 빨라지며 정확도도 경쟁력 있게 유지되는 더 강력한 베이스라인이 된다.
트릭을 순차적으로 추가하면 파라미터나 FLOPs의 큰 증가 없이 mAP가 39.1%에서 44.0%로 상승하고 최적 구성에서 45.2%까지 상승한다.
IoU 손실, IoU 인식, Grid Sensitive는 추론 비용 증가를 최소화하면서 상당한 mAP 이득을 제공하며 Matrix NMS는 탐욕적 NMS보다 AP를 향상시킨다.
CoordConv와 SPP는 각각 0.5%와 0.3%의 추가적 mAP 증가를 제공하며 시간 오버헤드는 미미하다.
프리트레이닝된 백본으로서 증류된 ResNet50-vd는 추가적인 소폭 개선(약 0.3% AP)을 제공한다.
COCO test-dev에서 608 입력 크기의 PP-YOLO는 45.2% AP와 72.9 FPS를 V100에서 달성한다(배치=1, TRT 제외).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.