QUICK REVIEW

[论文解读] Large Selective Kernel Network for Remote Sensing Object Detection

Yuxuan Li, Qibin Hou|arXiv (Cornell University)|Mar 16, 2023

Remote-Sensing Image Classification被引用 46

一句话总结

论文提出了 LSKNet，通过一系列分解的深度分离卷积和一个空间核选择机制动态扩展并选择大感受野，在 HRSC2016、DOTA-v1.0 和 FAIR1M-v1.0 上达到最先进的结果。

ABSTRACT

Recent research on remote sensing object detection has largely focused on improving the representation of oriented bounding boxes but has overlooked the unique prior knowledge presented in remote sensing scenarios. Such prior knowledge can be useful because tiny remote sensing objects may be mistakenly detected without referencing a sufficiently long-range context, and the long-range context required by different types of objects can vary. In this paper, we take these priors into account and propose the Large Selective Kernel Network (LSKNet). LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To the best of our knowledge, this is the first time that large and selective kernel mechanisms have been explored in the field of remote sensing object detection. Without bells and whistles, LSKNet sets new state-of-the-art scores on standard benchmarks, i.e., HRSC2016 (98.46\% mAP), DOTA-v1.0 (81.85\% mAP) and FAIR1M-v1.0 (47.87\% mAP). Based on a similar technique, we rank 2nd place in 2022 the Greater Bay Area International Algorithm Competition. Code is available at https://github.com/zcablii/Large-Selective-Kernel-Network.

研究动机与目标

利用遥感领域的先验知识，即对象需要不同的上下文信息以实现准确检测。
开发一种主干机制，能够使用大型、选择性卷积核动态扩展其感受野。
高效融合多尺度上下文特征，以改善空中图像中微小或上下文依赖对象的检测。
展示在标准遥感基准数据集上使用轻量、可扩展的主干实现的最先进性能。

提出的方法

将一个大型卷积核分解为一系列逐渐增大核大小和扩张率的深度卷积，以扩展感受野。
构建一个大核选择（LK）模块，用于处理多尺度特征并为每个分解后的卷积核应用基于空间注意力的选择掩码。
将来自不同感受野的输出拼接并进行池化，然后生成空间注意力图，以对大核特征进行加权和融合。
将最终的 LSK 特征计算为输入特征与学习到的注意力特征的逐元素乘积（Y = X · S）。
将 LSK 模块集成到主干的残差块中（LK Selection 块 + FFN），用于诸如 Oriented RCNN 的检测器。
提供变体（LSKNet-T、LSKNet-S），具有不同的通道维度和块数，以在准确性和效率之间取得平衡。

实验结果

研究问题

RQ1大核、选择性融合的主干是否能够在不同对象尺度和上下文的数据集上提升遥感目标检测？
RQ2在空中图像中应如何分解和组合大型感受野，以最大化速度与精度的权衡？
RQ3空间（相对于通道）核选择是否更好地捕捉遥感数据固有的空间上下文变化？
RQ4将 LSKNet 集成到各种检测框架（两阶段和单阶段）在标准基准上的表现如何？

主要发现

LSKNet 在 HRSC2016（98.46%）、DOTA-v1.0（摘要中报告的 81.85%；结果表中为 81.64%），以及 FAIR1M-v1.0（47.87%）上达到最先进的 mAP。
LSKNet-S 在单张 RTX3090 的情况下，对 1024×1024 图像达到 18.1 FPS，同时保持强准确性。
对大核进行两核分解在 DOTA-v1.0 上提供了有利的速度-精度权衡（在不同配置下 mAP 为 80.91–81.31）。
空间选择在遥感任务中优于通道注意力，且模型倾向于在较浅层使用较大核，在较深层使用较小核。
LSKNet-T/S 主干在多种检测框架（两阶段和单阶段）上提升，并且在参数和浮点运算量方面与 ResNet-18 基线相比具有竞争力。
可视化分析证实，不同对象类别需要不同的上下文范围，与推动 LSKNet 的先验相一致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。