QUICK REVIEW

[論文レビュー] Adaptive Image Zoom-in with Bounding Box Transformation for UAV Object Detection

Tao Wang, Chenyu Lin|arXiv (Cornell University)|Feb 7, 2026

Advanced Neural Network Applications被引用数 0

ひとこと要約

ZoomDetは角に整列したバウンディングボックス変換を用いた非均一な画像ズームフレームワークを導入し、Faster R-CNNおよび YOLO アーキテクチャに適用可能な最小遅延でUAV物体検知を改善する。

ABSTRACT

Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework is explored for object detection on UAV images. The main motivation is that the foreground objects are generally smaller and sparser than those in common scene images, which hinders the optimization of effective object detectors. We thus aim to zoom in adaptively on the objects to better capture object features for the detection task. To achieve the goal, two core designs are required: extcolor{black}{i) How to conduct non-uniform zooming on each image efficiently? ii) How to enable object detection training and inference with the zoomed image space?} Correspondingly, a lightweight offset prediction scheme coupled with a novel box-based zooming objective is introduced to learn non-uniform zooming on the input image. Based on the learned zooming transformation, a corner-aligned bounding box transformation method is proposed. The method warps the ground-truth bounding boxes to the zoomed space to learn object detection, and warps the predicted bounding boxes back to the original space during inference. We conduct extensive experiments on three representative UAV object detection datasets, including VisDrone, UAVDT, and SeaDronesSee. The proposed ZoomDet is architecture-independent and can be applied to an arbitrary object detection architecture. Remarkably, on the SeaDronesSee dataset, ZoomDet offers more than 8.4 absolute gain of mAP with a Faster R-CNN model, with only about 3 ms additional latency. The code is available at https://github.com/twangnh/zoomdet_code.

研究の動機と目的

入力画像上で適応的なズーミングを可能にすることで、UAV映像の小さく希少な物体の課題に対処する。
検出器の訓練・推論を保持する軽量でアーキテクチャに依存しないズーミングフレームワークを開発する。
ズーム空間でGround-truthと予測を整列させるためのオフセットベースのズーミングモデルとコーナー整列ボックス変換を提案する。
オフセット予測子を適切に拡大するよう監督するオブジェクトズーミング損失を導入する。
複数のUAVデータセットと一般的な検出器で、計算オーバーヘッドを最小化した上で利得を示す。

提案手法

各ピクセルの空間オフセット Δx, Δy を予測する軽量な OffsetNet を用い、非均一なズーミング写像 T(x,y) = (x+Δx, y+Δy) を定義する。
元画像を下位のグリッドで予測し、補間して全サイズに拡張する。
各バウンディングボックスのズーミング比率 mi を最大化するオブジェクトズーミング損失を、αとβのパラメータを持つ対数ベースの目的関数で導入する。
前方写像の最近傍探索を用いてコーナーをマッピングし、逆写像で元の空間に戻すコーナー整列アプローチでGround-truthボックスをズーム空間へ変換する。
推論時には予測ボックスを元の画像空間へ戻して評価を行う。
全体の訓練目的は、標準検出損失と提案されたズーミング損失を組み合わせたものであり、L = L_detection + L_zoom となる。

実験結果

リサーチクエスチョン

RQ1適応的な非均一ズーミングは、大きな遅延を生むことなく小さなUAVオブジェクトの検出を改善できるか。
RQ2元画像空間とズーム後空間の間で境界ボックスの注釈を効果的に変換して、訓練と評価を可能にするにはどうするべきか。
RQ3オフセットベースのズーミング機構と境界ボックス誘導の目的が、UAVシナリオでの顕性度ベースのズーミングを上回るか。
RQ4ZoomDetはアーキテクチャに依存せず、Faster R-CNNやYOLOなどの検出器と互換性があるか。
RQ5VisDrone、UAVDT、SeaDronesSeeの各データセットで、標準検出器と組み合わせたときの実証的な利得はどうなるか。

主な発見

ZoomDetはUAVデータセット全体で顕著なmAP向上をもたらし、SeaDronesSeeではFaster R-CNNで8.4以上の絶対的mAP増加と約3 msの追加遅延を実現。
VisDroneおよびUAVDTでは、過度なオーバーヘッドを伴わず約2.0 mAPの改善を提供。
ZoomDetはアーキテクチャに依存せず、2段階検出器も1段階検出器も改善可能。
オフセットベースの非均一ズーミングと境界ボックス誘導の変換により、座標マッピングを使った顕性ベースの方法よりも訓練と推論が効果的になる。
パッチベースおよび暗黙的ズーム手法と互換性があり、直交的な改善を提供。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。