QUICK REVIEW

[論文レビュー] Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark

Brian K. S. Isaac-Medina, Matt Poyser|arXiv (Cornell University)|Mar 25, 2021

Video Surveillance and Tracking Methods参考文献 51被引用数 95

ひとこと要約

本論文は、可視光および赤外線画像を用いた深層学習ベースの無人航空機（UAV）検出と追跡のための、初めての包括的ベンチマークを提示している。3つの多様なデータセットを用いて、4つのオブジェクト検出器と3つのトラッキングフレームワークを評価した。検出では98.6%のmAP、追跡では98.7%のMOTAを達成し、赤外線から可視光へのクロスモダリティ移行性が82.8%のmAPで実証された。

ABSTRACT

Unmanned Aerial Vehicles (UAV) can pose a major risk for aviation safety, due to both negligent and malicious use. For this reason, the automated detection and tracking of UAV is a fundamental task in aerial security systems. Common technologies for UAV detection include visible-band and thermal infrared imaging, radio frequency and radar. Recent advances in deep neural networks (DNNs) for image-based object detection open the possibility to use visual information for this detection and tracking task. Furthermore, these detection architectures can be implemented as backbones for visual tracking systems, thereby enabling persistent tracking of UAV incursions. To date, no comprehensive performance benchmark exists that applies DNNs to visible-band imagery for UAV detection and tracking. To this end, three datasets with varied environmental conditions for UAV detection and tracking, comprising a total of 241 videos (331,486 images), are assessed using four detection architectures and three tracking frameworks. The best performing detector architecture obtains an mAP of 98.6% and the best performing tracking framework obtains a MOTA of 96.3%. Cross-modality evaluation is carried out between visible and infrared spectrums, achieving a maximal 82.8% mAP on visible images when training in the infrared modality. These results provide the first public multi-approach benchmark for state-of-the-art deep learning-based methods and give insight into which detection and tracking architectures are effective in the UAV domain.

研究の動機と目的

深層ニューラルネットワークを用いたUAV検出と追跡のための標準化された、複数のデータセットをカバーするベンチマークを確立すること。
多様な環境的および撮影条件下での最先端のオブジェクト検出および追跡アーキテクチャの性能を評価すること。
赤外線帯と可視光帯のUAV画像間でのクロスモダリティの自己教師的学習を調査すること。
実世界のシナリオにおける対UAV用途に最適な、耐障害性の高い検出および追跡フレームワークを同定すること。
自動UAV検出および追跡分野の研究を加速するための公開ベンチマークツールキットを提供すること。

提案手法

本研究では、3つのUAVデータセット（MAV-VID、Drone-vs-Bird、Anti-UAV）を用い、4つのオブジェクト検出アーキテクチャ（Faster R-CNN、YOLOv3、SSD512、DETR）を評価した。
3つのトラッキングフレームワーク（SORT、DeepSORT、Tracktor）を、検出されたバウンディングボックスを入力として時間的関連付けに用いた。
クロスモダリティ評価は、赤外線データで学習し可視光画像でテストする、およびその逆の手順を実施した。
ベンチマークでは、検出には平均平均精度（mAP）、追跡には複数オブジェクト追跡精度（MOTA）という標準指標を用いた。
データセットには、地上およびUAV搭載カメラを含み、さまざまな距離、動的シーン、光学的および赤外線条件下でのUAVの撮影を含む。
複数の環境的条件、例えば複雑な背景、高速なカメラの動き、隠蔽状態なども含めた評価を実施した。

実験結果

リサーチクエスチョン

RQ1多様な環境的条件下で、可視光帯のUAV画像に対して最も高いmAPを達成する深層学習オブジェクト検出アーキテクチャは何か？
RQ2赤外線から可視光への自己教師的学習（例：赤外線から可視光）は、UAV検出性能にどのように影響するか？
RQ3高速で小さなUAVを、困難な視覚的条件下で追跡する際、どのトラッキングフレームワークが最も高いMOTAを達成するか？
RQ4カメラの動きと背景の複雑さは、追跡システムの性能にどのように影響するか？
RQ5一般向けオブジェクト検出器は、UAVに特化した検出および追跡タスクにどの程度効果的に適応可能か？

主な発見

最良の検出器であるYOLOv3は、可視光帯のUAV画像で平均平均精度（mAP）98.6%を達成した。
Faster R-CNNは、小型UAVに対して最高のmAP（最大0.770）を達成し、早期検出において優れた性能を示した。
Tracktorトラッキングフレームワークは、98.7%の最高MOTAを達成し、高速なカメラの動きと長期追跡に効果的であることを示した。
赤外線データで学習し可視光画像でテストするクロスモダリティ検出は、82.8%のmAPを達成し、モダリティ間の効果的な移行性を実証した。
DETRベースの検出バックボーンは優れた性能（mAP > 0.94）を示し、対UAVシステムにおける小型オブジェクト追跡に適している。
DeepSORTおよびTracktorにおける再識別ネットワークは、一貫した性能向上をもたらさず、むしろ性能を低下させる場合もあり、UAVに特化した再識別モデルの開発が求められる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。