QUICK REVIEW

[论文解读] Edge-guided Representation Learning for Underwater Object Detection

Linhui Dai, Hong Liu|arXiv (Cornell University)|Jun 1, 2023

Underwater Acoustics Research被引用 4

一句话总结

该论文提出 ERL-Net，一种新颖的边缘引导表示学习框架，用于水下目标检测，通过边缘感知注意力、多尺度特征聚合以及宽而不对称的感受野，增强了特征的可区分性。该方法通过显式利用边缘线索，在低对比度、小目标和伪装场景下显著提升了检测性能，在三个具有挑战性的水下数据集上实现了最先进（SOTA）的性能表现。

ABSTRACT

Underwater object detection (UOD) is crucial for marine economic development, environmental protection, and the planet's sustainable development. The main challenges of this task arise from low-contrast, small objects, and mimicry of aquatic organisms. The key to addressing these challenges is to focus the model on obtaining more discriminative information. We observe that the edges of underwater objects are highly unique and can be distinguished from low-contrast or mimicry environments based on their edges. Motivated by this observation, we propose an Edge-guided Representation Learning Network, termed ERL-Net, that aims to achieve discriminative representation learning and aggregation under the guidance of edge cues. Firstly, we introduce an edge-guided attention module to model the explicit boundary information, which generates more discriminative features. Secondly, a feature aggregation module is proposed to aggregate the multi-scale discriminative features by regrouping them into three levels, effectively aggregating global and local information for locating and recognizing underwater objects. Finally, we propose a wide and asymmetric receptive field block to enable features to have a wider receptive field, allowing the model to focus on more small object information. Comprehensive experiments on three challenging underwater datasets show that our method achieves superior performance on the UOD task.

研究动机与目标

解决低对比度水下图像、小而密集的目标以及水生生物伪装带来的挑战。
在背景杂乱和颜色伪装严重干扰检测的复杂水下环境中，提升特征的可区分性。
利用边缘信息作为强归纳偏置，引导表示学习并提升定位精度。
设计一个统一框架，整合边缘引导与多尺度特征学习及上下文建模，实现鲁棒的检测。

提出的方法

提出边缘引导注意力（Edge-Guided Attention, EGA）模块，通过边缘图显式建模边界信息，以优化特征表示。
引入特征聚合（Feature Aggregation, FA）模块，将多尺度特征重新组织为低、中、高层表示，以融合全局与局部上下文信息。
设计宽而不对称感受野模块（Wide and Asymmetric Receptive Field Block, WA-RFB），以非对称方式扩展感受野，提升对小目标的敏感性。
将 EGA、FA 和 WA-RFB 模块整合到统一的网络架构中，兼容单阶段（如 RetinaNet）和两阶段（如 Faster R-CNN、Cascade R-CNN）检测器。
在训练过程中将边缘图作为辅助监督信号，引导注意力与特征学习，无需额外标注。
采用多任务学习策略，结合检测头预测与边缘感知特征优化，实现端到端联合优化。

实验结果

研究问题

RQ1显式的边缘监督是否能提升低对比度水下图像中的特征可区分性？
RQ2边缘引导注意力如何提升对小目标或伪装目标的定位与识别能力？
RQ3由边缘线索引导的多尺度特征聚合，在多样化水下场景中能在多大程度上提升检测性能？
RQ4与标准卷积感受野相比，宽而不对称的感受野是否能更好地捕捉上下文信息以提升小目标检测性能？

主要发现

在 UTDAC2020 数据集上，ERL-Net 在 COCO 风格的 AP@[0.5:0.05:0.95] 指标下达到 0.484 的平均平均精度（mAP），优于 SOTA 方法如 SABL 和 NAS-FCOS。
在小目标上，ERL-Net 在 IoU=0.75 时达到 0.128 的 mAP（AP75），较 SABL（0.085）和 NAS-FCOS（0.091）提升 2.5–3.7%，展现出对微小且难检测目标的强大性能。
AP50 提升至 0.836，显著高于 SABL（0.815）和 NAS-FCOS（0.423），表明在较低 IoU 阈值下具有更高的检测召回率。
定性结果表明，ERL-Net 通过利用精确的边缘特征，有效减少了误检（如将潜水设备误判为海星）现象。
消融实验表明，将边缘引导注意力（EGA）与通道注意力（CA）结合，可获得更高的 mAP（0.484），优于仅使用 CA（0.477），证明了边缘监督的附加价值。
注意力图可视化结果证实，ERL-Net 更关注物体的完整边界而非仅中心区域，从而提升了形状感知的检测能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。