QUICK REVIEW

[論文レビュー] DAMO-YOLO : A Report on Real-Time Object Detection Design

Xianzhe Xu, Yiqi Jiang|arXiv (Cornell University)|Nov 23, 2022

Advanced Neural Network Applications被引用数 118

ひとこと要約

DAMO-YOLOはNASを用いたバックボーン、効率的なRepGFPNネック、コンパクトなZeroHead、AlignOTAラベル割り当て、蒸留を組み合わせ、一般モデルと軽量モデルのCOCOにおけるリアルタイム物体検出の最新性能を達成します。

ABSTRACT

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.

研究の動機と目的

産業用途への展開を念頭にリアルタイム物体検出の改善を動機づける。
レイテンシを意識したニューラルアーキテクチャ検索を用いたYOLOベースの検出器を開発する。
レイテンシと精度のトレードオフを最適化する、効率的なネックと軽量なヘッドを設計する。
動的設定で分類と回帰を整合させるラベル割り当てを改善する。
小型モデルの性能を向上させるため蒸留を取り入れる。

提案手法

遅延制約のあるバックボーンを探索するためにMAE-NASを用い、ResNet風/CSP風の構造を生み出す。
スケール横断で柔軟なチャネル次元を持つEfficient RepGFPNネックを開発し、queen-fusionでのアップサンプリングを削除する。
ZeroHeadを導入し、タスク投影層だけを維持して大きなネックと小さなヘッドを実現する。
AlignOTAを提案し、分類コストと回帰コストのバランスを取る整合動的ラベル割り当て。
2段階のトレーニングとチャネル単位の動的温度を用いた蒸留を適用し、小型モデルの性能を向上させる。

実験結果

リサーチクエスチョン

RQ1遅延制約下でNAS設計のバックボーンは、リアルタイム検出器のCOCO mAPをどう改善できるか？
RQ2リアルタイム制約下で最も高い精度を達成するネック/ヘッドの構成は何か？
RQ3YOLO風検出器の動的ラベル割り当てにおいて、AlignOTAは分類と回帰の整合を改善するか？
RQ4リアルタイムのスループットを損なうことなく、蒸留は小型のDAMO-YOLOモデルの性能をどう向上させるか？

主な発見

DAMO-YOLO-T/S/M/Lは、COCOでそれぞれ43.6/47.7/50.2/51.9 mAPを、T4 GPUで2.78/3.83/5.62/7.95 msのレイテンシで達成。
DAMO-YOLO-Ns/Nm/Nl軽量モデルは、X86-CPUで4.08/5.05/6.69 msのレイテンシでCOCOの32.3/38.2/40.5 mAPを達成。
MAE-NASバックボーン（MAE-Res、MAE-CSP）は、精度とレイテンシのトレードオフでCSP-Darknetベースラインを上回り、特に大規模/深いネットワークで顕著。
大きなネックと一つの投影ヘッド（ZeroHead）設計は、計算量を減らしつつ高い性能を発揮。
AlignOTAはATSS/sOTA/TOODよりラベル割り当てを改善し、APを高く達成。
蒸留（CWDが有利）は、小〜中程度のDAMO-YOLOモデルの性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。