QUICK REVIEW

[論文レビュー] From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution

Ragib Amin Nihal, Benjamin Yen|arXiv (Cornell University)|Jan 26, 2024

Advanced Neural Network Applications被引用数 6

ひとこと要約

2段階の B2BDet アプローチは、カスタム SRGAN ベースの超解像ステップと修正済み SR-YOLOv5 デテクターを組み合わせ、空中画像の小さな物体検出を強化します。VisDrone で最先端の mAP を達成し、NWPU-VHR10、SeaDroneSee、VEDAI でも高い結果を示します。

ABSTRACT

Aerial object detection presents challenges from small object sizes, high density clustering, and image quality degradation from distance and motion blur. These factors create an information bottleneck where limited pixel representation cannot encode sufficient discriminative features. B2BDet addresses this with a two-stage framework that applies domain-specific super-resolution during inference, followed by detection using an enhanced YOLOv5 architecture. Unlike training-time super-resolution approaches that enhance learned representations, our method recovers visual information from each input image. The approach combines aerial-optimized SRGAN fine-tuning with architectural innovations including an Efficient Attention Module (EAM) and Cross-Layer Feature Pyramid Network (CLFPN). Evaluation across four aerial datasets shows performance gains, with VisDrone achieving 52.5% mAP using only 27.7M parameters. Ablation studies show that super-resolution preprocessing contributes +2.6% mAP improvement while architectural enhancements add +2.9%, yielding +5.5% total improvement over baseline YOLOv5. The method achieves computational efficiency with 53.8% parameter reduction compared to recent approaches while achieving strong small object detection performance.

研究の動機と目的

空中画像における小さく密集した物体の課題に対処する。
ドメイン特化の超解像前処理器を通じて画像品質を向上させる。
空撮シーンに適した軽量な SR-YOLOv5 デテクターを開発する。
一般化を示すため、多様な空中データセットで評価して汎化性能を示す。

提案手法

SR前処理に続く SR-YOLOv5 検出を含む、2段階の B2BDet パイプラインを提案する。
空撮画像に特化したカスタム SRGAN を開発し、低解像度入力をアップサンプリング・強化する。
小物体検出のため、CSPDarknet53 バックボーンを縮小し、C3STR トランスフォーマモジュール、FPN ネック、および SPP+BottleneckCSP ヘッドを備えた SR-YOLOv5 を設計する。
空中ドメインデータで SRGAN と検出器を訓練し、データ拡張、アンカー調整、複合スケーリングを活用する。
データセット全体で平均適合率 (mAP) およびマルチスケール入力テストで評価する。

実験結果

リサーチクエスチョン

RQ1ドメイン特化の超解像段階は、空中画像の小さな物体検出を改善できるか。
RQ2特定目的の軽量SR-YOLOv5検出器は、小さく密集した物体に対して既存の空中検出アーキテクチャを上回るか。
RQ3B2BDet パイプラインは VisDrone、NWPU-VHR10、SeaDroneSee、VEDAI のデータセットでどのように性能を示すか。

主な発見

方法	バックボーン	PED	PER	BC	Car	Van	Truck	TRI	ATRI	Bus	MO	mAP50 (%)
Fast R-CNN	VGG-16	21.4	15.6	6.7	51.7	29.5	19.0	13.1	7.7	31.4	20.7	21.7
Faster R-CNN	ResNet101	20.9	14.8	7.3	51.0	29.7	19.5	14.0	8.8	30.5	21.2	21.8
Cascade R-CNN	ResNet50	22.2	14.8	7.6	54.6	31.5	21.6	14.8	8.6	34.9	21.4	23.2
RetinaNet	ResNet50	13.0	7.9	1.4	45.5	19.9	11.5	6.3	4.2	17.8	11.8	13.9
CenterNet	ResNet50	22.6	20.6	14.6	59.7	24.0	21.3	20.1	17.4	37.9	23.7	26.2
YOLOv4	CSPDarknet53	24.8	12.6	8.6	64.3	22.4	22.7	11.4	7.6	44.3	21.7	30.7
DMNet	ResNet101	28.5	20.4	15.9	56.8	37.9	30.1	22.6	14.0	47.1	29.2	30.3
HRDet+	HRNetV2-W48	28.6	14.5	11.7	49.4	37.1	35.2	28.8	21.9	43.3	23.5	28.0
CDNet	CSPDarknet53	35.6	19.2	13.8	55.8	42.1	38.2	33.0	25.4	49.5	29.3	34.2
HR-Cascade++	ResNet101	32.6	17.3	11.1	54.7	42.4	35.3	32.7	24.1	46.5	28.2	32.5
MSC-CenterNet	ResNet50	33.7	15.2	12.1	55.2	40.5	34.1	29.2	21.6	42.2	27.5	31.1
YOLOv3-LITE	DarkNet-53	34.5	23.4	7.9	70.8	31.3	21.9	15.3	6.2	40.9	32.7	28.5
MSA-YOLO	CSPDarknet53	33.4	17.3	11.2	76.8	41.5	41.4	14.8	18.4	60.9	31.0	34.7
SCA-YOLO	CSPDarknet53	57.3	43.9	23.7	85.5	49.2	45.4	35.0	18.3	61.8	54.2	47.4
B2BDet(Proposed)	CSPDarknet53	55.3	36.6	27.5	87.8	57.0	64.1	37.7	31.7	73.0	54.1	52.5

VisDrone では、B2BDet は 52.5% mAP を達成し、SCA-YOLO (47.4%) および YOLOv4 (30.7%) を上回る。
NWPU-VHR10 の結果は 10 クラス全体で 90.5% の mAP、( airplane 99.5%、 vehicle 96.9%)。
SeaDroneSee は全体で 76% の mAP を達成し、boat 96.3%、jetski 93.4% の高スコア。
VEDAI は全体で 77.5% の mAP、cars は 89.3%。
SR-YOLOv5 の規模はパフォーマンス向上にも関わらず軽量のまま（270 層、27.7M パラメータ、109.5 GFLOPs）。
このアプローチは小さく密集した物体の検出を改善し、データセット選択とアーキテクチャ適応の重要性を示している。

From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。