Skip to main content
QUICK REVIEW

[论文解读] Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection

Tianyi Zhao, Maoxun Yuan|arXiv (Cornell University)|Jan 19, 2024
Advanced Image Fusion Techniques被引用 5
一句话总结

RSDet 引入一个自粗到细的融合框架,并包含 Redundant Spectrum Removal 模块和 Dynamic Feature Selection 模块,用于融合 RGB 和 IR 特征以提升 RGB-IR 目标检测性能。它在多个 RGB-IR 数据集上达到最新的最优结果。

ABSTRACT

In recent years, object detection utilizing both visible (RGB) and thermal infrared (IR) imagery has garnered extensive attention and has been widely implemented across a diverse array of fields. By leveraging the complementary properties between RGB and IR images, the object detection task can achieve reliable and robust object localization across a variety of lighting conditions, from daytime to nighttime environments. Most existing multi-modal object detection methods directly input the RGB and IR images into deep neural networks, resulting in inferior detection performance. We believe that this issue arises not only from the challenges associated with effectively integrating multimodal information but also from the presence of redundant features in both the RGB and IR modalities. The redundant information of each modality will exacerbates the fusion imprecision problems during propagation. To address this issue, we draw inspiration from the human brain's mechanism for processing multimodal information and propose a novel coarse-to-fine perspective to purify and fuse features from both modalities. Specifically, following this perspective, we design a Redundant Spectrum Removal module to remove interfering information within each modality coarsely and a Dynamic Feature Selection module to finely select the desired features for feature fusion. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called the Removal then Selection Detector (RSDet). Extensive experiments on three RGB-IR object detection datasets verify the superior performance of our method.

研究动机与目标

  • Motivate robust RGB-IR object detection by mitigating modality-specific noise during fusion.
  • Propose a coarse-to-fine fusion paradigm inspired by human multisensory processing.
  • Develop RSDet, featuring Redundant Spectrum Removal and Dynamic Feature Selection modules.
  • Demonstrate superior performance on multiple RGB-IR datasets with extensive experiments.

提出的方法

  • Introduce Coarse-to-Fine Fusion to purify and fuse RGB and IR features.
  • Redundant Spectrum Removal (RSR): dynamic frequency-domain filtering to remove irrelevant spectrum per modality.
  • Dynamic Feature Selection (DFS): mixture-of-scale-aware-experts to gate and fuse multi-scale modality features.
  • Shared-specific representation learning to disentangle shared vs. modality-specific features.
  • Integration into Faster R-CNN-based RSDet with a mutual information-based supervision for shared/specific features.

实验结果

研究问题

  • RQ1Can coarse-to-fine fusion improve RGB-IR object detection by reducing modality noise before fusion?
  • RQ2Do RSR and DFS modules contribute individually and jointly to performance gains in RGB-IR detectors?
  • RQ3How does RSDet perform compared to state-of-the-art multispectral detectors on KAIST, FLIR, and LLVIP datasets?

主要发现

  • RSDet 在 KAIST 的 All-day/night 设置下达到最先进的性能,并在 Near、Medium、Far 尺度下表现出色。
  • 消融实验表明 RSR 提供可观的增益;DFS 带来显著改进,特别是在 FLIR 和 LLVIP 数据集上。
  • 在 FLIR 与 LLVIP 数据集上,RSDet 在 mAP@50 和 mAP 指标上超过了若干基线。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。