QUICK REVIEW

[논문 리뷰] ATFusion: An Alternate Cross-Attention Transformer Network for Infrared and Visible Image Fusion

Han Yan, Songlei Xiong|arXiv (Cornell University)|2024. 01. 22.

Advanced Image Fusion Techniques인용 수 8

한 줄 요약

ATFuse는 크로스-어텐션에 불일치(discrepancy) 및 공정보정보 주입 모듈을 도입하여 Transformer 기반의 IV 이미지 융합 프레임워크와 결합되며, 질감과 두드러진 구조의 균형을 맞추기 위한 구간화된 픽셀 손실을 도입하여 우수한 융합 성능을 달성한다.

ABSTRACT

The fusion of infrared and visible images is essential in remote sensing applications, as it combines the thermal information of infrared images with the detailed texture of visible images for more accurate analysis in tasks like environmental monitoring, target detection, and disaster management. The current fusion methods based on Transformer techniques for infrared and visible (IV) images have exhibited promising performance. However, the attention mechanism of the previous Transformer-based methods was prone to extract common information from source images without considering the discrepancy information, which limited fusion performance. In this paper, by reevaluating the cross-attention mechanism, we propose an alternate Transformer fusion network (ATFusion) to fuse IV images. Our ATFusion consists of one discrepancy information injection module (DIIM) and two alternate common information injection modules (ACIIM). The DIIM is designed by modifying the vanilla cross-attention mechanism, which can promote the extraction of the discrepancy information of the source images. Meanwhile, the ACIIM is devised by alternately using the vanilla cross-attention mechanism, which can fully mine common information and integrate long dependencies. Moreover, the successful training of ATFusion is facilitated by a proposed segmented pixel loss function, which provides a good trade-off for texture detail and salient structure preservation. The qualitative and quantitative results on public datasets indicate our ATFusion is effective and superior compared to other state-of-the-art methods.

연구 동기 및 목표

적외선과 가시 광학 모달리티 간의 불일치 정보를 명시적으로 다룸으로써 IV 이미지 융합의 개선 동기를 제공한다.
공정보정보를 추출하기 위한 전용 모듈을 갖춘 대체 Transformer 융합 네트워크(ATFuse)를 제안한다.
질감 디테일 보존과 두드러진 구조 유지를 균형 있게 다루는 구간화된 픽셀 손실을 개발한다.
공개 IV 데이터셋에서 질적 및 정량적 융합 성능을 우수하게 시연한다.

제안 방법

IV 이미지 융합을 위한 특징 추출-융합-재구성 파이프라인을 도입한다.
교차 어텐션 메커니즘을 수정하여 불일치 정보를 포착하는 불일치 정보 주입 모듈(DIIM)을 개발한다.
모달리티 간의 공정보 정보를 교대로 융합하고 강화하는 대체 공정보 주입 모듈(ACIIM)을 개발한다.
장거리 의존성과 모달리티별 세부 정보를 최대화하기 위해 두 단계 DIIM + ACIIM 융합 방식을 사용한다.
가장 두드러진 픽셀과 덜 두드러진 영역에 서로 다른 제약을 적용하는 구간화된 픽셀 손실을 사용하여 질감과 밝기를 보존한다.

실험 결과

연구 질문

RQ1적외선과 가시 영상 간의 불일치 정보를 융합을 위해 크로스-어텐션을 어떻게 적응시킬 수 있는가?
RQ2모달리티 간의 공정보 정보와 장거리 의존성을 더 잘 보존하기 위해 교대로 정보 주입 전략이 가능한가?
RQ3구간화된 픽셀 손실이 융합된 IV 영상에서 두드러진 세부 정보와 질감 보존을 개선하는가?
RQ4ATFuse가 공개 데이터셋에서 최첨단 Transformer- 및 CNN 기반 IV 융합 방법들과 비교하여 어떤 성능을 보이는가?

주요 결과

DIIM과 ACIIM을 갖춘 ATFuse가 여러 최첨단 방법보다 융합된 영상에서 두드러진 IR 정보와 질감 세부 정보를 더 잘 보존한다.
구간화된 픽셀 손실은 데이터셋 전반에 걸쳐 두드러진 정보 보존과 질감 보존 사이의 균형을 제공한다.
변인 연구에서 DIIM과 ACIIM 모두 성능 향상에 기여하며, 두 모듈 중 하나가 누락된 변형보다 전체 ATFuse 구조가 더 우수하게 나타난다.
RoadScene, MSRS, TNO 데이터셋에서 경사 기반 및 정보 이론적 기준 등 다수의 지표에서 우수한 정량적 성능을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.