QUICK REVIEW

[论文解读] H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection

Xue Yang, Gefan Zhang|arXiv (Cornell University)|Oct 13, 2022

Advanced Neural Network Applications被引用 24

一句话总结

H2RBox 使用水平框注释，结合弱监督和自监督学习来预测定向边界框，在内存更低、速度更快的情况下实现与 HBox 监督方法相竞争的结果，并接近 RBox 监督检测器。

ABSTRACT

Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection. This paper proposes a simple yet effective oriented object detection approach called H2RBox merely using horizontal box annotation for weakly-supervised training, which closes the above gap and shows competitive performance even against those trained with rotated boxes. The cores of our method are weakly- and self-supervised learning, which predicts the angle of the object by learning the consistency of two different views. To our best knowledge, H2RBox is the first horizontal box annotation-based oriented object detector. Compared to an alternative i.e. horizontal box-supervised instance segmentation with our post adaption to oriented object detection, our approach is not susceptible to the prediction quality of mask and can perform more robustly in complex scenes containing a large number of dense objects and outliers. Experimental results show that H2RBox has significant performance and speed advantages over horizontal box-supervised instance segmentation methods, as well as lower memory requirements. While compared to rotated box-supervised oriented object detectors, our method shows very close performance and speed. The source code is available at PyTorch-based \href{https://github.com/yangxue0827/h2rbox-mmrotate}{MMRotate} and Jittor-based \href{https://github.com/yangxue0827/h2rbox-jittor}{JDet}.

研究动机与目标

弥合可获得的水平框注释与定向目标检测需求之间的差距。
提出一个两分支的 H2RBox 框架，在没有 RBox 标签的情况下学习物体角度。
证明 H2RBox 能在准确性和效率上超越 HBox-监督的实例分割基线，并接近 RBox-监督检测器。

提出的方法

使用基于 WS 的 FCOS 的旋转检测器的两分支架构，利用 GT HBox 通过水平外接矩形进行监督。
一个自监督 (SS) 分支，对输入视图进行旋转并在视图之间强制 RBox 预测的一致性。
为避免在视图生成期间 ground-truth 角度泄露的填充/裁剪策略（零填充、居中裁剪、对称填充）。
标签再分配策略（一对一、一对多）以使 SS 分支目标与 WS 预测对齐。
总损失 L_total = L_ws + lambda L_ss，包含分类、中心性、回归，以及角度/尺度一致性的详细项。

实验结果

研究问题

RQ1水平框注释是否能够在没有 RBox 标签的情况下有效训练定向目标检测器？
RQ2如何将弱监督和自监督学习结合起来，从 HBox 数据中恢复准确的旋转预测？
RQ3视图生成策略和标签再分配对方向准确性和整体检测性能有何影响？
RQ4H2RBox 与 HBox-监督的实例分割基线以及 RBox-监督检测器在准确性、内存和速度方面的比较如何？

主要发现

在 DOTA-v1.0 上，H2RBox 在 AP50 上分别超越 BoxInst-RBox 与 BoxLevelSet-RBox，提升了 14.31% 和 11.46%（67.90% 对 53.59% 与 56.44%）。
使用 H2RBox，内存占用为 6.25 GB，推理速度为 31.6 FPS，大约是 BoxInst-RBox 的 1/3 内存且快 12 倍于 BoxLevelSet-RBox，并显著快于后者。
通过多尺度训练/测试，全 RBox 监督的 FCOS 与之之间的差距在 DOTA-v1.0 上下降到 0.91%（AP 75: 74.40 对 75.31）。
在 DIOR-R 上，H2RBox 实现 AP 33.15、AP50 57.00、AP75 32.60，接近 RBox 监督的 FCOS（AP 34.16、AP50 58.60、AP75 31.90）。
消融实验表明自监督损失（L_ss）至关重要，在使用时将 AP 从较低值大幅提升到 35.92%（DOTA-v1.0）和 33.15%（DIOR-R）。
HBX-Mask-RBox 基线（BoxInst-RBox、BoxLevelSet-RBox）在所测试的数据集上被 H2RBox 超越，且在准确性和效率方面表现更佳。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。