QUICK REVIEW

[論文レビュー] YOLOv3: An Incremental Improvement

Joseph Redmon, Ali Farhadi|arXiv (Cornell University)|Apr 8, 2018

Advanced Image and Video Retrieval Techniques参考文献 11被引用数 5,881

ひとこと要約

YOLOv3は小さな設計更新と、より大きくて高性能なバックボーン（Darknet-53）を導入して、速度を維持しつつ競争力のある精度を実現。特にAP50で。高速性を保つ。

ABSTRACT

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/

研究の動機と目的

精度と速度の向上につながるYOLOの段階的な更新を要約する。
新しいバックボーン（Darknet-53）とマルチスケール予測戦略を説明する。
AP50およびmAPに類似した指標で、YOLOv3の性能をRetinaNetおよびSSDと比較する。
性能向上に寄与しなかった実験から得られた教訓を説明する。
検出指標と実践的なデプロイメントへの影響を論じる。

提案手法

クラスターからのオフセットを用いた4つの座標とアンカーボックスを使って境界ボックスを予測する。
ロジスティック回帰と各グラウンドトゥルースオブジェクトごとに割り当てられた1つのプリオリを用いたオブジェクト性スコアを使用する。
独立したマルチラベルロジスティック分類器でクラスを予測する。
特徴ピラミッドのようなアップサンプリングと結合を用いて3つのスケールで予測を行う。
残差接続を備えたバックボーンとしてDarknet-53を導入する。
Darknetフレームワークでマルチスケール訓練と標準的なデータ拡張で訓練する。

実験結果

リサーチクエスチョン

RQ1 incremental design changes affect YOLO’s speed-accuracy trade-off compared to prior versions and other detectors?
RQ2What is the impact of a new backbone (Darknet-53) on detection performance and computational efficiency?
RQ3Does multi-scale prediction improve small-object detection and overall COCO metrics?
RQ4How do alternative training choices (e.g., focal loss, different anchor offsets) affect YOLOv3 performance?
RQ5What are the limitations of AP50 vs COCO mean AP metrics for evaluating detectors like YOLOv3?

主な発見

YOLOv3 runs in 22 ms at 320×320 input with 28.2 mAP, and is as accurate as SSD while being several times faster than RetinaNet under AP50.
With 608×608 input, YOLOv3 achieves 33.0 AP, 57.9 AP50, 34.4 AP75, 18.3 AP S, 35.4 AP M, and 41.9 AP L on COCO, and is faster than RetinaNet while maintaining competitive accuracy.
Darknet-53 backbone matches state-of-the-art classifiers in accuracy with fewer FLOPs and higher FPS than comparable ResNets.
YOLOv3 provides strong AP50 performance and speed trade-offs, though COCO-style AP (AP at 0.5:0.95) may lag behind some one-stage detectors.
Anchor-box and x,y offset predictions, and focal loss experiments did not improve mAP in this study.
Multi-scale predictions help improve small-object detection (AP S) while maintaining overall speed advantages

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。