QUICK REVIEW

[论文解读] Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection

Jinyuan Liu, Xin Fan|arXiv (Cornell University)|Mar 30, 2022

Infrared Target Detection Methodologies被引用 31

一句话总结

该论文提出 TarDAL，一个以目标为感知的双对抗融合网络，由双层优化引导并带有检测，并引入 M3FD 基准用于多场景 IR-可见对象检测，在检测性能方面表现突出且融合高效。

ABSTRACT

This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Aiming at generating an image of high visual quality, previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. These approaches neglect that modality differences implying the complementary information are extremely important for both fusion and subsequent detection task. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. The fusion network with one generator and dual discriminators seeks commons while learning from differences, which preserves structural information of targets from the infrared and textural details from the visible. Furthermore, we build a synchronized imaging system with calibrated infrared and optical sensors, and collect currently the most comprehensive benchmark covering a wide range of scenarios. Extensive experiments on several public datasets and our benchmark demonstrate that our method outputs not only visually appealing fusion but also higher detection mAP than the state-of-the-art approaches.

研究动机与目标

利用红外与可见模态的互补信息推动面向检测的融合。
将融合和检测表述为一个双层优化问题，并将其展开为可训练网络。
开发一个目标感知的双对抗融合网络，保留目标结构和纹理细节。
创建一个同步的 IR-visible 成像系统和一个全面的多场景基准 (M3FD) 以供评估。

提出的方法

将融合和检测的双层优化表述为单层联合学习问题。
设计 TarDAL，包含一个生成器和两个判别器，以学习公共特征同时利用模态差异（目标判别器和细节判别器）。
使用基于 SSIM 的结构损失和以显著性程度权重加权的像素损失来提升融合质量。
通过 Wasserstein 启发的损失，在目标区域（红外）和背景纹理（梯度/可见）上引入对抗性损失。
采用协同训练策略，对融合进行正则化以通过融合损失项提升检测性能。
提供一个同步成像系统和一个带对齐的 IR 和可见对的多场景多模态数据集（M3FD）及注释。

实验结果

研究问题

RQ1一个双层优化是否能在保持高质量融合的同时共同优化图像融合与目标检测以提升检测性能？
RQ2与先前的 IVIF 方法相比，目标感知的双对抗融合网络是否更好地保留目标结构和纹理细节？
RQ3融合与检测网络之间的协同训练是否带来更快的推理速度和更高的检测准确性？
RQ4全面的多场景 M3FD 基准如何支持从融合的 IR-可见数据中学习与评估检测？

主要发现

TarDAL 在多个数据集上实现了比基于融合的检测器方法更高的检测 mAP。
目标感知的双判别器有助于在融合图像中保留有辨识度的红外目标和可见纹理细节。
协同训练比仅任务或独立训练在融合质量与检测性能之间的平衡更有效。
M3FD 基准提供多样化场景（Day、Overcast、Night、Challenge），含 4,200 对齐的 IR-可见对和 33,603 个注释对象，覆盖六个类别。
TarDAL 展现出比有竞争的方法更低的参数量和计算复杂度的高效推理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。