QUICK REVIEW

[論文レビュー] MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

Zhe Li, Haiwei Pan|arXiv (Cornell University)|Apr 12, 2024

Advanced Image Fusion Techniques被引用数 17

ひとこと要約

MambaDFuse は、MMIF のための Mamba ベースの二相フレームワークを導入し、二レベルの特徴抽出と二相フュージョンを備えた Mamba ベースのデュアルフェーズを特徴とする。IVF および MIF で最先端の結果を達成し、下流の物体検出を改善する。

ABSTRACT

Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modality-specific and modality-fused features constrained by the inherent local reductive bias (CNN) or quadratic computational complexity (Transformers). To overcome this issue, we propose a Mamba-based Dual-phase Fusion (MambaDFuse) model. Firstly, a dual-level feature extractor is designed to capture long-range features from single-modality images by extracting low and high-level features from CNN and Mamba blocks. Then, a dual-phase feature fusion module is proposed to obtain fusion features that combine complementary information from different modalities. It uses the channel exchange method for shallow fusion and the enhanced Multi-modal Mamba (M3) blocks for deep fusion. Finally, the fused image reconstruction module utilizes the inverse transformation of the feature extraction to generate the fused result. Through extensive experiments, our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion. Additionally, in a unified benchmark, MambaDFuse has also demonstrated improved performance in downstream tasks such as object detection. Code with checkpoints will be available after the peer-review process.

研究の動機と目的

MMIF の融合品質と計算効率のバランスを動機づける。
MMIF における CNN/Transformer の限界を克服するための Mamba ベースのバックボーンを提案。
ローカル情報と長距離モダリティ特有情報の両方を捉えるデュアルレベル特徴抽出を設計。
複数のモダリティからのグローバルな概要と局所的なディテールを統合するデュアルフェーズ融合メカニズムを開発。
IVF（赤外-可視）および MIF（医療）融合タスクと下流検出の改善を示す。

提案手法

低レベルの特徴には CNN を、ハイレベルの長距離特徴には Mamba ブロックを組み合わせたデュアルレベル特徴抽出器を使用。
グローバル情報を迅速に融合するため、チャネル交換を用いた浅層融合モジュールを実装。
クロスモーダル情報でモダリティ結合特徴を導く Multi-modal Mamba (M3) ブロックを用いた深層融合モジュールを開発。
特徴抽出パイプラインの逆変換によって融合画像を再構成。
SwinFusion の先行論文と同様に、SSIM、テクスチャ、強度項を組み合わせた損失で訓練。

実験結果

リサーチクエスチョン

RQ1CNN- や Transformer ベースのバックボーンと比べて、Mamba ベースのアーキテクチャは効率的で効果的な MMIF を達成できるか？
RQ2デュアルレベル特徴抽出は MMIF のモダリティ特有特徴の把握を改善するか？
RQ3浅層チャネル交換と深層 M3 ベースの融合を組み合わせたデュアルフェーズ融合は、IVF および MIF の優れた融合特徴を生み出すか？
RQ4MambaDFuse によって生成された融合画像は物体検出などの下流タスクを向上させるか？

主な発見

MambaDFuse は複数のデータセットで IVF および MIF ベンチマークでトップパフォーマンスを達成（IVF: MSRS, RoadScene, M3FD; MIF: MRI-CT, MRI-PET, MRI-SPECT）。
チャネル交換による浅層融合段階は、追加のパラメータなしで効果的にクロスモーダリティ情報を統合する。
モダリティ特有の特徴に導かれた詳細志向の融合を、M3 ブロックを用いた深層融合段階が改善。
融合画像はMI、VIF、SSIM、Qabf などの指標改善と、定性的比較で物体の境界がより明確になることを示す。
統一ベンチマークでは、MambaDFuse の融合画像を使用した場合、下流の物体検出性能が向上することを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。