QUICK REVIEW

[論文レビュー] Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu|arXiv (Cornell University)|Sep 3, 2024

Visual Attention and Saliency Detection被引用数 5

ひとこと要約

本論文は、camouflaged object detection のための Frequency-Spatial Entanglement Learning (FSEL) を提案し、Entanglement Transformer Blocks および dual-domain parsing を通じて、3つの COD ベンチマークで 21 件の最先端手法を上回るよう、グローバルな周波数特徴と局所的な空間特徴を統合します。

ABSTRACT

Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, but often ignore the sensitivity and locality of features in the spatial domain, leading to sub-optimal results. In this paper, we propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method. This method consists of a series of well-designed Entanglement Transformer Blocks (ETB) for representation learning, a Joint Domain Perception Module for semantic enhancement, and a Dual-domain Reverse Parser for feature integration in the frequency and spatial domains. Specifically, the ETB utilizes frequency self-attention to effectively characterize the relationship between different frequency bands, while the entanglement feed-forward network facilitates information interaction between features of different domains through entanglement learning. Our extensive experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets. The source code is available at: https://github.com/CSYSI/FSEL.

研究の動機と目的

背景の類似性に起因する純粋な空間特徴の限界に対処して、堅牢なカモフラージュ物体検出を動機づける。
識別性を高めるために、グローバルな周波数特徴と局所的な空間特徴を組み合わせるフレームワークを提案する。
エンタングルメント学習とクロスドメイン特徴最適化を可能にするメカニズム（ETB、JDPM、DRP）を開発する。
CAMO、COD10K、 NC4K データセットにおいて、21 件の最先端 COD メソッドを上回る優れた性能を示す。

提案手法

Frequency-Spatial Entanglement Learning (FSEL) アーキテクチャを導入し、周波数-空間エンタングルメントのための Entanglement Transformer Blocks (ETB) を含む。
Joint Domain Perception Module (JDPM) を用いて、周波数変換を用いて多様な受容野情報を再構築する。
Dual-domain Reverse Parser (DRP) を用いて、周波数ドメインと空間ドメインの双方で特徴フローを最適化し、多段階融合を実現する。
周波数自己注意 (FSA) を実装して周波数帯間の相関をモデル化し、エンタングルメント・フィードフォワード網 (EFFN) を用いてドメイン特徴を融合する。
複数の予測レベル (N1–N5) に渡って、加重 BCE と加重 IoUを組み合わせた損失で訓練する。
ベースエンコーダには PVTv2、ResNet、Res2Net を含み、FFT/IFTT ベースの演算を用いてグローバルな周波数手掛かりを導出し、空間手掛かりと相互作用させる。

Figure 1 : The visual comparison results of the proposed FSEL and current COD methods ( $i.e.$ , FPNet [ 4 ] , EVP [ 27 ] , and FEDER [ 13 ] ) in the spatial and frequency domain.

実験結果

リサーチクエスチョン

RQ1結合した周波数-空間表現は、純粋な空間ベースまたは周波数ベースの方法と比べて、カモフラージュ物体検出を改善できるか？
RQ2周波数領域の手掛かりを空間手掛かりと効果的にエンタングルして、COD におけるグローバルコンテキストと局所的なディテールをいかに高められるか？
RQ3ETB、JDPM、DRP の各コンポーネントは、複数のバックボーンアーキテクチャとデータセットに渡って COD の精度を総合的に向上させるか？
RQ4COD における背景ノイズ耐性と物体スケール変動に対する周波数-空間エンタングルメントの影響はどの程度か？

主な発見

FSEL は複数のバックボーンで、CAMO、COD10K、NC4K の 21 件の最先端 COD 手法を一貫して上回る。
Frequency self-attention は周波数帯間の関係をモデル化し、ハイ/ロー周波数対を超えたグローバル手掛かりを捉える。
Entanglement Transformer Blocks は FSA、SSA、および EFFN を通じて、周波数と空間特徴の間のクロスドメイン相互作用を可能にする。
Joint Domain Perception Module と Dual-domain Reverse Parser は、グローバルな周波数手掛かりを、周波数ドメインと空間ドメインの双方へ拡張し、特徴の最適化を促進する。
加重 BCE と IoU を5つの予測レベルで組み合わせた損失が効果的な監督信号を提供し、マルチレベル予測を改善する。

Figure 2 : Overview of the proposed FSEL model framework for camouflaged object detection. The proposed FSEL method generates predicted results through a Joint Domain Perception Module (JDPM), a series of stacked Entanglement Transformer Block (ETB), and a Dual-domain Reverse Parser (DRP).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。