QUICK REVIEW

[論文レビュー] Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Xue Yang, Xiaojiang Yang|arXiv (Cornell University)|Jun 3, 2021

Advanced Neural Network Applications参考文献 61被引用数 232

ひとこと要約

この論文は、境界ボックスのガウス表現間のKullback-Leibler Divergenceに基づく回転オブジェクト検出回帰損失を提案し、自己調整型でスケール不変な高精度検出を可能にし、その損失を横方向ケースへ縮退させる。

ABSTRACT

Existing rotated object detectors are mostly inherited from the horizontal detection paradigm, as the latter has evolved into a well-developed area. However, these detectors are difficult to perform prominently in high-precision detection due to the limitation of current regression loss design, especially for objects with large aspect ratios. Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection. We show that one essential challenge is how to modulate the coupled parameters in the rotation regression loss, as such the estimated parameters can influence to each other during the dynamic joint optimization, in an adaptive and synergetic way. Specifically, we first convert the rotated bounding box into a 2-D Gaussian distribution, and then calculate the Kullback-Leibler Divergence (KLD) between the Gaussian distributions as the regression loss. By analyzing the gradient of each parameter, we show that KLD (and its derivatives) can dynamically adjust the parameter gradients according to the characteristics of the object. It will adjust the importance (gradient weight) of the angle parameter according to the aspect ratio. This mechanism can be vital for high-precision detection as a slight angle error would cause a serious accuracy drop for large aspect ratios objects. More importantly, we have proved that KLD is scale invariant. We further show that the KLD loss can be degenerated into the popular $l_{n}$-norm loss for horizontal detection. Experimental results on seven datasets using different detectors show its consistent superiority, and codes are available at https://github.com/yangxue0827/RotationDetection and https://github.com/open-mmlab/mmrotate.

研究の動機と目的

Motivate moving from induction-based rotation regression (building on horizontal detectors) to a deductive, general rotation regression framework.
Introduce a regression loss that treats rotated boxes as 2-D Gaussians and measures distance with KLD.
Show that KLD provides dynamic, parameter-coupled gradient behavior that adapts to object geometry and scale.
Demonstrate that KLD is scale-invariant and degenerates to common horizontal regression losses when theta = 0.
Validate the approach across multiple datasets and detectors, achieving state-of-the-art rotation detection results.

提案手法

Convert each rotated bounding box B(x, y, w, h, theta) into a 2-D Gaussian N(mu, Sigma).
Compute the regression loss as the Kullback-Leibler Divergence between predicted and ground-truth Gaussians; analyze parameter gradients to show self-modulation.
Propose an affine-invariant formulation and show asymmetry variants of KLD; derive gradients with respect to (x, y, w, h, theta).
Normalize the distance with a non-linear function f(D) and a scale parameter tau to obtain the final L_reg loss (Eq. 18).
Provide a multi-task loss combining L_reg with a focal classification loss for end-to-end training.
Demonstrate that the horizontal case is a special degenerate of the proposed loss.

実験結果

リサーチクエスチョン

RQ1Can a regression loss for rotated object detection be designed from a deductive perspective that treats rotation as a general case and horizontal detection as a special case?
RQ2Does modeling rotated boxes as 2-D Gaussians and using KLD as a loss lead to self-modulated, scale-invariant gradients that improve high-precision detection?
RQ3How does KLD-based regression compare to Gaussian Wasserstein Distance and traditional L_n losses across diverse datasets and detectors?
RQ4Is the KLD-based loss capable of degenerating to standard horizontal detection losses when theta approaches zero?
RQ5What empirical gains (on which datasets and scenarios) does the proposed method achieve for high-precision detection?

主な発見

KLD-based regression provides self-modulated gradients that adjust importance of corners, center, and angle based on object aspect ratio and scale.
KLD is scale invariant, and its regression loss can degenerate to L2-like horizontal losses when theta = 0.
Nonlinear normalization (e.g., log(D+1) with tau = 1) yields optimal performance in ablations on HRSC2016.
Across seven datasets, KLD-based regression shows consistent superiority over Smooth L1 and Gaussian Wasserstein Distance in high-precision detection.
On HRSC2016, KLD with RetinaNet achieves up to 23.97% AP75 improvement over Smooth L1 in high-precision metrics, and similar gains are reported on MSRA-TD500 and ICDAR2015 with stronger detectors.
Ablation studies indicate the asymmetry of KLD does not significantly affect performance

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。