[论文解读] Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
该论文提出 Frequency-Spatial Entanglement Learning (FSEL) 用于伪装目标检测,通过 Entanglement Transformer Blocks 和双域解析将全局频率特征与局部空间特征融合,在三个 COD 基准测试上超越 21 个 SOTA 方法。
Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, but often ignore the sensitivity and locality of features in the spatial domain, leading to sub-optimal results. In this paper, we propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method. This method consists of a series of well-designed Entanglement Transformer Blocks (ETB) for representation learning, a Joint Domain Perception Module for semantic enhancement, and a Dual-domain Reverse Parser for feature integration in the frequency and spatial domains. Specifically, the ETB utilizes frequency self-attention to effectively characterize the relationship between different frequency bands, while the entanglement feed-forward network facilitates information interaction between features of different domains through entanglement learning. Our extensive experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets. The source code is available at: https://github.com/CSYSI/FSEL.
研究动机与目标
- 通过解决背景相似性导致的仅空间特征的局限性,推动鲁棒的伪装目标检测。
- 提出一个将全局频率特征与局部空间特征相结合以提高判别力的框架。
- 开发机制(ETB、JDPM、DRP)以实现纠缠学习和跨域特征优化。
- 在 CAMO、COD10K 和 NC4K 数据集上展示相较 21 种 state-of-the-art COD 方法的卓越性能。
提出的方法
- 提出 Frequency-Spatial Entanglement Learning (FSEL) 架构,包括用于频率-空间纠缠的 Entanglement Transformer Blocks (ETB)。
- 使用 Joint Domain Perception Module (JDPM) 通过频率变换重建多感受野信息。
- 采用 Dual-domain Reverse Parser (DRP) 在频率域和空间域优化特征流,实现多层融合。
- 实现频率自注意力 (FSA) 用于建模跨频带的相关性,以及纠缠前馈网络 (EFFN) 来融合领域特征。
- 用结合加权 BCE 与加权 IoU 的多预测层损失(N1–N5)进行训练。
- 基础编码器包括 PVTv2、ResNet 与 Res2Net;使用 FFT/IFTT 基于的操作来提取全局频率线索并与空间线索交互。
![Figure 1 : The visual comparison results of the proposed FSEL and current COD methods ( $i.e.$ , FPNet [ 4 ] , EVP [ 27 ] , and FEDER [ 13 ] ) in the spatial and frequency domain.](https://ar5iv.labs.arxiv.org/html/2409.01686/assets/x1.png)
实验结果
研究问题
- RQ1 joint frequency-spatial representation 能否比仅频域或仅空间的方法更好地提升伪装目标检测?
- RQ2如何将频域线索与空间线索有效纠缠,以提升 COD 的全局上下文和局部细节?
- RQ3ETB、JDPM 与 DRP 组件是否在多种骨干架构和数据集上共同提升 COD 准确性?
- RQ4频率-空间纠缠对背景噪声鲁棒性和对象尺度变化在 COD 中的影响如何?
主要发现
- FSEL 在 CAMO、COD10K、NC4K 上以多种骨干网络稳定超越 21 种 state-of-the-art COD 方法。
- 频率自注意力建模频带之间的关系,捕获超越高/低频对的全局线索。
- Entanglement Transformer Blocks 通过 FSA、SSA 与 EFFN 实现频率与空间特征的跨域交互。
- Joint Domain Perception Module 与 Dual-domain Reverse Parser 将全局频率线索扩展到频域与时域,以实现更好的特征优化。
- 将加权 BCE 与 IoU 结合在五个预测层上的损失提供有效监督并提升多层预测的性能。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。