[논문 리뷰] From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution
두 단계 B2BDet 접근 방식은 맞춤 SRGAN 기반 초해상도 단계와 수정된 SR-YOLOv5 탐지기를 결합하여 공중 영상의 작은 물체 탐지를 향상시키고 VisDrone에서 최첨단 mAP를 달성하며 NWPU-VHR10, SeaDroneSee 및 VEDAI에서도 우수한 결과를 보여준다.
Aerial object detection presents challenges from small object sizes, high density clustering, and image quality degradation from distance and motion blur. These factors create an information bottleneck where limited pixel representation cannot encode sufficient discriminative features. B2BDet addresses this with a two-stage framework that applies domain-specific super-resolution during inference, followed by detection using an enhanced YOLOv5 architecture. Unlike training-time super-resolution approaches that enhance learned representations, our method recovers visual information from each input image. The approach combines aerial-optimized SRGAN fine-tuning with architectural innovations including an Efficient Attention Module (EAM) and Cross-Layer Feature Pyramid Network (CLFPN). Evaluation across four aerial datasets shows performance gains, with VisDrone achieving 52.5% mAP using only 27.7M parameters. Ablation studies show that super-resolution preprocessing contributes +2.6% mAP improvement while architectural enhancements add +2.9%, yielding +5.5% total improvement over baseline YOLOv5. The method achieves computational efficiency with 53.8% parameter reduction compared to recent approaches while achieving strong small object detection performance.
연구 동기 및 목표
- Address the challenge of small, densely clustered objects in aerial imagery.
- Improve image quality via a domain-specific super-resolution preprocessor.
- Develop a lightweight SR-YOLOv5 detector tailored for aerial scenes.
- Evaluate on diverse aerial datasets to demonstrate generalization.
제안 방법
- Propose a two-stage B2BDet pipeline: SR pre-processing followed by SR-YOLOv5 detection.
- Develop a custom SRGAN tailored for aerial imagery to upsample and enhance low-resolution inputs.
- Design SR-YOLOv5 with a reduced CSPDarknet53 backbone, C3STR transformer modules, FPN neck, and SPP+BottleneckCSP heads for small-object detection.
- Train SRGAN and detector with aerial-domain data, using data augmentation, anchor tuning, and compound scaling.
- Evaluate with mean average precision (mAP) and multi-scale input testing across datasets.

실험 결과
연구 질문
- RQ1Can a domain-specific super-resolution stage improve small-object detection in aerial images?
- RQ2Does a tailored, lightweight SR-YOLOv5 detector outperform existing aerial detection architectures on small, dense objects?
- RQ3How does the B2BDet pipeline perform across VisDrone, NWPU-VHR10, SeaDroneSee, and VEDAI datasets?
주요 결과
| 방법 | 백본 | PED | PER | BC | Car | Van | Truck | TRI | ATRI | Bus | MO | mAP50 (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fast R-CNN | VGG-16 | 21.4 | 15.6 | 6.7 | 51.7 | 29.5 | 19.0 | 13.1 | 7.7 | 31.4 | 20.7 | 21.7 |
| Faster R-CNN | ResNet101 | 20.9 | 14.8 | 7.3 | 51.0 | 29.7 | 19.5 | 14.0 | 8.8 | 30.5 | 21.2 | 21.8 |
| Cascade R-CNN | ResNet50 | 22.2 | 14.8 | 7.6 | 54.6 | 31.5 | 21.6 | 14.8 | 8.6 | 34.9 | 21.4 | 23.2 |
| RetinaNet | ResNet50 | 13.0 | 7.9 | 1.4 | 45.5 | 19.9 | 11.5 | 6.3 | 4.2 | 17.8 | 11.8 | 13.9 |
| CenterNet | ResNet50 | 22.6 | 20.6 | 14.6 | 59.7 | 24.0 | 21.3 | 20.1 | 17.4 | 37.9 | 23.7 | 26.2 |
| YOLOv4 | CSPDarknet53 | 24.8 | 12.6 | 8.6 | 64.3 | 22.4 | 22.7 | 11.4 | 7.6 | 44.3 | 21.7 | 30.7 |
| DMNet | ResNet101 | 28.5 | 20.4 | 15.9 | 56.8 | 37.9 | 30.1 | 22.6 | 14.0 | 47.1 | 29.2 | 30.3 |
| HRDet+ | HRNetV2-W48 | 28.6 | 14.5 | 11.7 | 49.4 | 37.1 | 35.2 | 28.8 | 21.9 | 43.3 | 23.5 | 28.0 |
| CDNet | CSPDarknet53 | 35.6 | 19.2 | 13.8 | 55.8 | 42.1 | 38.2 | 33.0 | 25.4 | 49.5 | 29.3 | 34.2 |
| HR-Cascade++ | ResNet101 | 32.6 | 17.3 | 11.1 | 54.7 | 42.4 | 35.3 | 32.7 | 24.1 | 46.5 | 28.2 | 32.5 |
| MSC-CenterNet | ResNet50 | 33.7 | 15.2 | 12.1 | 55.2 | 40.5 | 34.1 | 29.2 | 21.6 | 42.2 | 27.5 | 31.1 |
| YOLOv3-LITE | DarkNet-53 | 34.5 | 23.4 | 7.9 | 70.8 | 31.3 | 21.9 | 15.3 | 6.2 | 40.9 | 32.7 | 28.5 |
| MSA-YOLO | CSPDarknet53 | 33.4 | 17.3 | 11.2 | 76.8 | 41.5 | 41.4 | 14.8 | 18.4 | 60.9 | 31.0 | 34.7 |
| SCA-YOLO | CSPDarknet53 | 57.3 | 43.9 | 23.7 | 85.5 | 49.2 | 45.4 | 35.0 | 18.3 | 61.8 | 54.2 | 47.4 |
| B2BDet(Proposed) | CSPDarknet53 | 55.3 | 36.6 | 27.5 | 87.8 | 57.0 | 64.1 | 37.7 | 31.7 | 73.0 | 54.1 | 52.5 |
- On VisDrone, B2BDet achieves 52.5% mAP, surpassing SCA-YOLO (47.4%) and YOLOv4 (30.7%).
- NWPU-VHR10 results reach 90.5% overall mAP across 10 classes (airplane 99.5%, vehicle 96.9%).
- SeaDroneSee achieves 76% overall mAP with high scores for boat (96.3%) and jetski (93.4%).
- VEDAI achieves 77.5% overall mAP, with cars at 89.3%.
- SR-YOLOv5 footprint remains lightweight despite performance gains (270 layers, 27.7M parameters, 109.5 GFLOPs).
- The approach improves detection of small, densely clustered objects and demonstrates the importance of dataset choice and architectural adaptation.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.