QUICK REVIEW

[論文レビュー] Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection

Jinyuan Liu, Xin Fan|arXiv (Cornell University)|Mar 30, 2022

Infrared Target Detection Methodologies被引用数 31

ひとこと要約

本論文は TarDAL を提案する。ターゲット認識型のデュアル敵対的融合ネットワークを、検出を伴う階層最適化によって導くとともに、マルチシナリオの IR-Visible 物体検出のための M3FD ベンチマークを提案し、優れた検出性能と効率的な融合を達成する。

ABSTRACT

This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Aiming at generating an image of high visual quality, previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. These approaches neglect that modality differences implying the complementary information are extremely important for both fusion and subsequent detection task. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. The fusion network with one generator and dual discriminators seeks commons while learning from differences, which preserves structural information of targets from the infrared and textural details from the visible. Furthermore, we build a synchronized imaging system with calibrated infrared and optical sensors, and collect currently the most comprehensive benchmark covering a wide range of scenarios. Extensive experiments on several public datasets and our benchmark demonstrate that our method outputs not only visually appealing fusion but also higher detection mAP than the state-of-the-art approaches.

研究の動機と目的

赤外と可視のモダリティが補完し合う情報を活かし、検出を目的とした融合を推進する。
融合と検出を階層ベースの最適化問題として定式化し、訓練可能なネットワークへ展開する。
ターゲット構造とテクスチャのディテールを保持するターゲット認識型デュアル敵対的融合ネットワークを開発する。
同期型のIR-Visible撮像システムと、評価用の包括的なマルチシナリオベンチマーク（M3FD）を作成する。

提案手法

融合と検出の階層最適化を定式化し、単一レベルの共同学習問題に変換する。
TarDAL を1つのジェネレータと2つの識別器で設計し、モダリティ差（ターゲット識別器とディテール識別器）を活用しつつ共通性を学習する。
SSIMベースの構造損失と、顕著性度合いの重み付けを用いたピクセル損失を融合品質のために用いる。
ターゲット領域（赤外）と背景テクスチャ（勾配/可視）に対して Wasserstein インスパイアされた損失を用いた敵対的損失を組み込む。
融合損失項を通じて検出性能を改善するよう、協調学習スキームを採用する。
整列したIRと可視のペアとアノテーションを含む、同期型撮像システムとマルチシナリオ・マルチモダリティデータセット（M3FD）を提供する。

実験結果

リサーチクエスチョン

RQ1階層最適化は、画像融合と物体検出を共同で最適化して、検出性能を向上させつつ高品質な融合を維持できるか？
RQ2ターゲット認識型デュアル敵対的融合ネットワークは、従来の IVIF 手法と比較してターゲット構造とテクスチャのディテールをより良く保持できるか？
RQ3融合ネットワークと検出ネットワークの協調訓練は、推論を速くし検出精度を向上させるか？
RQ4包括的なマルチシナリオ M3FD ベンチマークは、IR-Visible 融合データからの検出の学習と評価をどのように支援するか？

主な発見

TarDAL は、複数のデータセットで最先端の融合ベース検出器より高い検出 mAP を達成する。
ターゲット認識型のデュアル識別器は、融合画像において識別可能な赤外ターゲットと可視テクスチャの詳細を保持するのに役立つ。
協調訓練は、タスクのみ訓練や独立訓練よりも融合品質と検出性能のバランスを効果的に取る。
M3FD ベンチマークは、Day, Overcast, Night, Challenge の多様なシナリオを提供し、4,200 の整列したIR-Visibleペアと6クラスにわたる33,603のアノテーション済み物体を含む。
TarDAL は競合手法に比べ、パラメータ数が少なく計算量も低く、推論が効率的であることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。