[论文解读] PP-YOLOv2: A Practical Object Detector
PP-YOLOv2 通过一组逐步评估的改进增强了 PP-YOLO,在 COCO test-dev 上达到 49.5% mAP,约 69 FPS;在 640 输入和 FP16 的情况下,使用 TensorRT 达到 106.5 FPS。
Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time unchanged. This paper will analyze a collection of refinements and empirically evaluate their impact on the final model performance through incremental ablation study. Things we tried that didn't work will also be discussed. By combining multiple effective refinements, we boost PP-YOLO's performance from 45.9% mAP to 49.5% mAP on COCO2017 test-dev. Since a significant margin of performance has been made, we present PP-YOLOv2. In terms of speed, PP-YOLOv2 runs in 68.9FPS at 640x640 input size. Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2's infer speed, which achieves 106.5 FPS. Such a performance surpasses existing object detectors with roughly the same amount of parameters (i.e., YOLOv4-CSP, YOLOv5l). Besides, PP-YOLOv2 with ResNet101 achieves 50.3% mAP on COCO2017 test-dev. Source code is at https://github.com/PaddlePaddle/PaddleDetection.
研究动机与目标
- 在保持推理速度的前提下提高实际使用的目标检测精度。
- 在增量消融框架中对一系列改进进行实证评估。
- 提供关于在不损害效率的前提下组合技巧的可行性建议。
- 展示使用 PaddlePaddle 和 TensorRT 的部署友好性能。
提出的方法
- Baseline PP-YOLO with ResNet50-vd-dcn backbone.
- Incremental refinements including PAN neck, Mish activation in the neck, larger input size, and IoU aware branch.
- Soft-label formulation for IoU aware loss to stabilize training.
- Training setup with SGD on COCO train2017 for 500K iterations across 8 GPUs; input size sampling from a broad range.
- Evaluation on COCO minival and comparison to state-of-the-art detectors.
- Reported FPS, parameters, GFLOPs, and mAP for ablations and final model.
实验结果
研究问题
- RQ1What refinements can improve PP-YOLO without increasing inference time by a significant margin?
- RQ2How do changes like PAN, Mish activation in neck, and larger input size interact in terms of accuracy vs. speed?
- RQ3What is the impact of a reworked IoU aware loss on training stability and mAP?
- RQ4How does PP-YOLOv2 compare to contemporary detectors (e.g., YOLOv4-CSP, YOLOv5l) in speed-accuracy trade-offs?
主要发现
- Final PP-YOLOv2 achieves 49.5% mAP on COCO test-dev with ResNet50-vd-dcn at 640 input and 68.9 FPS.
- With PaddlePaddle and TensorRT FP16, batch size 1, PP-YOLOv2 reaches 106.5 FPS.
- Compared to YOLOv4-CSP and YOLOv5l at similar parameter counts, PP-YOLOv2 outperforms them in mAP for similar speeds.
- Replacing the backbone with ResNet101-vd-dcn yields competitive mAP with faster inference relative to some baselines (e.g., YOLOv5x).
- An ablation sequence shows PAN + Mish in neck, larger input size, and IoU-aware branch collectively raise mAP from 45.1% to 49.1% before final optimizations.
- PP-YOLOv2 outperforms the original PP-YOLO baseline (45.1% mAP) after applying refinements without adding substantial inference cost.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。