QUICK REVIEW

[论文解读] Oriented Object Detection with Transformer

Teli Ma, Mingyuan Mao|arXiv (Cornell University)|Jun 6, 2021

Advanced Neural Network Applications被引用 31

一句话总结

本论文提出 O2DETR，一种端到端的基于 Transformer 的检测器，用于任意方向对象，在编码器中使用深度可分离卷积替代自注意力，在 DOTA 数据集上实现具有竞争力的 mAP，并受益于用于改进性能的简单微调头。

ABSTRACT

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on an end-to-end network. The contributions of $ m O^2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $ m O^2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $ m O^2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.

研究动机与目标

在不使用旋转锚点或后处理 refinements 的情况下，推动定向对象检测。
提出一个端到端的 Transformer 检测器，用于带角度预测的旋转边界框。
通过在编码器中用深度可分离卷积替代注意力来提高效率。
在 DOTA 数据集上展示具有竞争力的性能，并探索一个微调头以提升结果。

提出的方法

通过在对象查询中添加角度维度来扩展 DETR，以适应定向框。
用深度可分离卷积替代 Transformer 编码器自注意力，以减少内存和计算量。
结合多尺度特征图和对象查询与编码器记忆之间的跨注意力来预测 (x, y, w, h, α)。
在检测头中使用一个 3 层 MLP 加一个线性层来输出 (x_c, y_c, w, h, α) 和类别分数。
可选地在 ROIAlign 基于特征上对 O2DETR 的预测进行微调，以提升最终边界框和置信度。

实验结果

研究问题

RQ1Transformer 基于的检测器是否可以直接应用于任意定向对象检测，而无需旋转锚点？
RQ2在编码器中用深度可分离卷积替代自注意力，在 dense、小尺寸、定向对象场景中是否能在保持或提高精度的同时提升效率？
RQ3多尺度特征融合在 Transformer 框架中的定向对象检测性能有何影响？
RQ4在使用 O2DETR 作为区域提案网络时，使用 ROIAlign 的轻量级微调头是否能进一步提升检测精度？

主要发现

O2DETR 在不进行细化的情况下在 DOTA 上取得比若干旋转检测器更高的 mAP，较 Faster R-CNN 和 RetinaNet 基线最高提升 3.85 mAP。
采用 DSConv 的编码器在密集、小目标场景中优于自注意力（例如 DSConv 与 ResNet-50 配合时达到 66.10 mAP，Attn 为 65.33 mAP）。
通过 ROIAlign 基于特征微调 O2DETR 头部，获得显著提升（例如，ResNet-50 单尺度输入下 74.47 mAP，多尺度时 79.66 mAP），证明了后处理微调的有效性。
结合多尺度特征和角度感知对象查询的 O2DETR，在 DOTA 数据集的多类目上提供有竞争力的结果。
Recall 分析显示，在不同 IoU 阈值下，O2DETR 的提案比传统 RPN 的召回率更高，支持其作为强区域提案骨干网络的使用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。