QUICK REVIEW

[論文レビュー] PP-YOLO: An Effective and Efficient Implementation of Object Detector

Xiang Long, Kaipeng Deng|arXiv (Cornell University)|Jul 23, 2020

Advanced Neural Network Applications参考文献 47被引用数 234

ひとこと要約

PP-YOLOは、YOLOv3ベースの検出器を一連のトリックで強化し、モデルサイズやFLOPsを大幅に増やさずにmAPを大幅に向上させつつ、高速な推論を維持します。COCO評価で45.2%のAPを達成し、72.9 FPSを実現します。

ABSTRACT

Object detection is one of the most important areas in computer vision, which plays a key role in various practical scenarios. Due to limitation of hardware, it is often necessary to sacrifice accuracy to ensure the infer speed of the detector in practice. Therefore, the balance between effectiveness and efficiency of object detector must be considered. The goal of this paper is to implement an object detector with relatively balanced effectiveness and efficiency that can be directly applied in actual application scenarios, rather than propose a novel detection model. Considering that YOLOv3 has been widely used in practice, we develop a new object detector based on YOLOv3. We mainly try to combine various existing tricks that almost not increase the number of model parameters and FLOPs, to achieve the goal of improving the accuracy of detector as much as possible while ensuring that the speed is almost unchanged. Since all experiments in this paper are conducted based on PaddlePaddle, we call it PP-YOLO. By combining multiple tricks, PP-YOLO can achieve a better balance between effectiveness (45.2% mAP) and efficiency (72.9 FPS), surpassing the existing state-of-the-art detectors such as EfficientDet and YOLOv4.Source code is at https://github.com/PaddlePaddle/PaddleDetection.

研究の動機と目的

現実世界の導入に向け、精度とスピードのバランスを取る実用的な物体検出器を促進する。
パラメータやFLOPsを大幅に増やさず、検出性能を向上させる既存のトリックを活用する。
YOLOv3をベースとした検出器を、バックボーンやNASの変更なしに改良するレシピのようなガイドを提供する。

提案手法

YOLOv3のバックボーンをResNet50-vd-dcnに置換して、より強力なベースライン（ResNet50-vd-dcnバックボーン）を作成する。
既存のトリック（EMA、DropBlock、IoU loss、IoU aware、Grid Sensitive、Matrix NMS、CoordConv、SPP、より良い事前学習）を慎重に統合して、効率を維持しつつ順次追加する。
デプロイ性のためにPaddlePaddle実装を使用し、YOLOv3と同様のバックボーン/FPN/ヘッド構造を維持する。
IoU-awareブランチと基本的なIoU lossを追加して、トレーニングをCOCOのmAP評価と整合させる。
高度なポスト処理（Matrix NMS）と座標の改良（Grid Sensitive、CoordConv）を適用して、重いコストをかけずに局所化を高める。
トレーニングを安定させ、最終精度を向上させるために、より大きなバッチサイズとEMAを試す。

実験結果

リサーチクエスチョン

RQ1モデルサイズやFLOPsを増やさずに、一連の proven tricks を組み合わせてCOCOのmAPを大幅に改善できるか？
RQ2PaddlePaddleフレームワークでYOLOv3ベースの検出器に適用した場合、どのトリックが精度向上に最も寄与するか？
RQ3PP-YOLOはスピードと精度の両面で、COCO評価において最先端の検 detector（例：EfficientDet、YOLOv4）と比較してどうか？
RQ4最終的な検出性能に対する異なる事前学習戦略の影響はどの程度か？

主な発見

手法	バックボーン	サイズ	FPS (V100)	AP	AP50	AP75	APs	APm	APl	ノート
A	Darknet53 YOLOv3	640	-	38.9	-	-	-	-	-	Darknet53を用いたYOLOv3のベースライン
B	ResNet50-vd-dcn YOLOv3	640	79.2	39.1	-	-	-	-	-	ResNet50-vd-dcnバックボーンを用いたベースライン
C	B + LB + EMA + DropBlock	640	79.2	41.4	-	-	-	-	-	Baseline + training enhancements
D	C + IoU Loss	640	79.2	41.9	-	-	-	-	-	IoU lossブランチを追加
E	D + IoU Aware	640	74.9	42.5	-	-	-	-	-	IoU awareブランチを追加
F	E + Grid Sensitive	640	74.8	42.8	-	-	-	-	-	Grid center decodingの調整
G	F + Matrix NMS	640	74.8	43.5	-	-	-	-	-	NMSをMatrix NMSに置換
H	G + CoordConv	640	74.1	44.0	-	-	-	-	-	いくつかの層にCoordConvを追加
I	H + SPP	640	72.9	44.3	-	-	-	-	-	Spatial Pyramid Poolingの追加
J	I + Better ImageNet Pretrain	640	72.9	44.6	-	-	-	-	-	Distilled ResNet50-vd pretraining

YOLOv3のバックボーンをResNet50-vd-dcnに置換すると、DarkNet-53よりも大幅にFLOPsを低く抑えつつ推論が高速化され、精度も競合的。
トリックを順次追加すると、パラメータやFLOPsの大幅な増加なしにmAPが39.1%から44.0%へ（最適構成では45.2%へ）向上。
IoU Loss、IoU Aware、Grid Sensitiveは、推論コストの増加を最小限に抑えつつ顕著なmAP向上を提供。Matrix NMSはGreedy NMSよりAPを改善。
CoordConvとSPPは、それぞれ小さなmAPの追加（0.5%、0.3%）を提供し、時間コストは控えめ。
蒸留済みのResNet50-vdをプリトレインバックボーンとして使用すると、追加の改善（約0.3% AP）をもたらす。
COCO test-devで、608入力サイズのPP-YOLOは、V100で45.2% APと72.9 FPSを達成（batch=1、w/o TRT）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。