QUICK REVIEW

[論文レビュー] ATFusion: An Alternate Cross-Attention Transformer Network for Infrared and Visible Image Fusion

Han Yan, Songlei Xiong|arXiv (Cornell University)|Jan 22, 2024

Advanced Image Fusion Techniques被引用数 8

ひとこと要約

ATFuse は、Transformer ベースの IV 画像融合フレームワークにおいて、クロスアテンションに不一致情報注入モジュールと共通情報注入モジュールを導入し、テクスチャと顕著な構造のバランスを取るセグメント化ピクセル損失を組み合わせることで、優れた融合性能を達成する。

ABSTRACT

The fusion of infrared and visible images is essential in remote sensing applications, as it combines the thermal information of infrared images with the detailed texture of visible images for more accurate analysis in tasks like environmental monitoring, target detection, and disaster management. The current fusion methods based on Transformer techniques for infrared and visible (IV) images have exhibited promising performance. However, the attention mechanism of the previous Transformer-based methods was prone to extract common information from source images without considering the discrepancy information, which limited fusion performance. In this paper, by reevaluating the cross-attention mechanism, we propose an alternate Transformer fusion network (ATFusion) to fuse IV images. Our ATFusion consists of one discrepancy information injection module (DIIM) and two alternate common information injection modules (ACIIM). The DIIM is designed by modifying the vanilla cross-attention mechanism, which can promote the extraction of the discrepancy information of the source images. Meanwhile, the ACIIM is devised by alternately using the vanilla cross-attention mechanism, which can fully mine common information and integrate long dependencies. Moreover, the successful training of ATFusion is facilitated by a proposed segmented pixel loss function, which provides a good trade-off for texture detail and salient structure preservation. The qualitative and quantitative results on public datasets indicate our ATFusion is effective and superior compared to other state-of-the-art methods.

研究の動機と目的

赤外と可視モダリティ間の不一致情報を明示的に処理することにより、IV画像融合の改善を促進する。
不一致情報と共通情報を抽出する専用モジュールを備えた代替的な Transformer 融合ネットワーク（ATFuse）を提案する。
テクスチャの細部保持と顕著な構造の保持をバランスさせるセグメント化ピクセル損失を開発する。
公開IVデータセットで定性的・定量的な優れた融合性能を実証する。

提案手法

IV画像融合のための特徴抽出・融合・再構成パイプラインを導入する。
クロスアテンション機構を不一致情報を捉えるように変更して不一致情報注入モジュール（DIIM）を開発する。
モダリティ間で共通情報を交互に融合・強化する代替的な共通情報注入モジュール（ACIIM）を開発する。
長距離依存性とモダリティ特有の詳細を最大化する2段階の DIIM + ACIIM 融合スキームを用いる。
最も顕著なピクセルとそれ以外の領域に異なる制約を適用するセグメント化ピクセル損失を採用し、テクスチャと明度を保持する。

実験結果

リサーチクエスチョン

RQ1クロスアテンションを適応させて、赤外画像と可視画像の不一致情報を融合のために抽出するにはどうすればよいか？
RQ2モダリティ間で共通情報と長距離依存性をより良く保持するために、交互情報注入戦略は有効か？
RQ3セグメント化ピクセル損失は、融合されたIV画像における顕著なディテールとテクスチャの保持を改善するか？
RQ4公開データセットにおける最先端の Transformer および CNN ベースの IV 融合法に対して、ATFuse はどう性能を示すか？

主な発見

DIIM および ACIIM を搭載した ATFuse は、融合画像において顕著な赤外情報とテクスチャの細部を、いくつかの最先端手法よりも良く保持する。
セグメント化ピクセル損失は、データセット全体で顕著な情報の保持とテクスチャ保持のバランスの取れたトレードオフを提供する。
アブレーション研究により、DIIM と ACIIM の両方が性能向上に寄与し、どちらのモジュールも欠如したバージョンより、完全な ATFuse 構造が優れていることが示された。
RoadScene、MSRS、TNO データセットでの定量的結果は、勾配ベースおよび情報理論的基準など複数の指標で優れた客観的指標を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。