QUICK REVIEW

[논문 리뷰] MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

Zhe Li, Haiwei Pan|arXiv (Cornell University)|2024. 04. 12.

Advanced Image Fusion Techniques인용 수 17

한 줄 요약

MambaDFuse는 이중 수준 특징 추출기와 MMIF용 이중 단계 융합을 갖춘 Mamba 기반 이중 단계 프레임워크를 도입하여 IVF와 MIF에서 최첨단 결과를 달성하고 다운스트림 물체 검출을 개선합니다.

ABSTRACT

Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modality-specific and modality-fused features constrained by the inherent local reductive bias (CNN) or quadratic computational complexity (Transformers). To overcome this issue, we propose a Mamba-based Dual-phase Fusion (MambaDFuse) model. Firstly, a dual-level feature extractor is designed to capture long-range features from single-modality images by extracting low and high-level features from CNN and Mamba blocks. Then, a dual-phase feature fusion module is proposed to obtain fusion features that combine complementary information from different modalities. It uses the channel exchange method for shallow fusion and the enhanced Multi-modal Mamba (M3) blocks for deep fusion. Finally, the fused image reconstruction module utilizes the inverse transformation of the feature extraction to generate the fused result. Through extensive experiments, our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion. Additionally, in a unified benchmark, MambaDFuse has also demonstrated improved performance in downstream tasks such as object detection. Code with checkpoints will be available after the peer-review process.

연구 동기 및 목표

fusion 품질과 계산 효율의 균형을 MMIF에 추진한다.
CNN/Transformer의 한계를 MMIF에서 극복하기 위한 Mamba 기반 백본을 제안한다.
지역적 및 장거리 모달리티 특이 정보를 모두 포착하기 위한 이중 수준 특징 추출 설계.
다중 모달에서 글로벌 개요와 로컬 디테일을 통합하는 이중 단계 융합 메커니즘 개발.
IVF(Infrar-Visible) 및 MIF(의학) 융합 작업과 다운스트림 탐지에서의 개선을 입증한다.

제안 방법

저수준 특징은 CNN으로, 고수준 장거리 특징은 Mamba 블록으로 결합한 이중 수준 특징 추출기를 사용한다.
채널 교환을 통한 얕은 융합 모듈로 글로벌 정보를 신속하게 융합한다.
교차 모달 정보를 활용해 모달리티 융합 특징을 안내하는 Multi-modal Mamba(M3) 블록을 이용한 심층 융합 모듈을 개발한다.
특징 추출 파이프라인의 역변환을 통해 융합 이미지를 재구성한다.
SSIM, 질감, 강도 항을 결합한 손실을 사용하여 학습한다(이전 SwinFusion 작업과 동일한 구성을 따른다).

실험 결과

연구 질문

RQ1Mamba 기반 아키텍처가 CNN- 또는 Transformer 기반 백본 대비 MMIF에서 효율적이고 효과적일 수 있는가?
RQ2이중 수준 특징 추출기가 MMIF에서 모달리티 특이 특징 포착을 향상시키는가?
RQ3얕은 채널 교환과 심층 M3 기반 융합을 포함하는 이중 단계 융합이 IVF 및 MIF에서 우수한 융합 특징을 제공하는가?
RQ4MambaDFuse로 생성된 융합 이미지가 물체 검출과 같은 다운스트림 작업을 향상시키는가?

주요 결과

MambaDFuse는 여러 데이터 세트에서 IVF 및 MIF 벤치마크에서 선두 성능을 달성한다(MRV: IVF: MSRS, RoadScene, M3FD; MIF: MRI-CT, MRI-PET, MRI-SPECT).
채널 교환을 통한 얕은 융합 단계가 추가 매개변수 없이 교차 모달 정보를 효과적으로 통합한다.
M3 블록을 이용한 심층 융합 단계가 모달리티 특이 특징으로 안내되는 디테일 중심 융합을 향상시킨다.
융합 이미지는 정량적 지표(MI, VIF, SSIM, Qabf)에서 개선되고 정성적 비교에서 물체 구분이 더 선명하다.
단일 벤치마킹은 MambaDFuse 융합 이미지를 사용할 때 다운스트림 물체 검출 성능이 향상됨을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.