QUICK REVIEW

[논문 리뷰] Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models

Jun Yue, Leyuan Fang|arXiv (Cornell University)|2023. 01. 19.

Advanced Image Fusion Techniques인용 수 15

한 줄 요약

Dif-Fusion은 확산 모델을 활용하여 적외선 및 가시 영상으로부터 다중 채널 분포를 학습하고, 다중 채널 손실을 통해 색상 충실도가 높은 융합 이미지를 직접 생성한다.

ABSTRACT

Color plays an important role in human visual perception, reflecting the spectrum of objects. However, the existing infrared and visible image fusion methods rarely explore how to handle multi-spectral/channel data directly and achieve high color fidelity. This paper addresses the above issue by proposing a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors. In specific, instead of converting multi-channel images into single-channel data in existing fusion methods, we create the multi-channel data distribution with a denoising network in a latent space with forward and reverse diffusion process. Then, we use the the denoising network to extract the multi-channel diffusion features with both visible and infrared information. Finally, we feed the multi-channel diffusion features to the multi-channel fusion module to directly generate the three-channel fused image. To retain the texture and intensity information, we propose multi-channel gradient loss and intensity loss. Along with the current evaluation metrics for measuring texture and intensity fidelity, we introduce a new evaluation metric to quantify color fidelity. Extensive experiments indicate that our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.

연구 동기 및 목표

적외선 및 가시 영상 융합에서 단일 채널 후처리를 넘어 색상 충실도 향상을 목표로 한다.
다중 채널 입력을 잠재 분포로 간주하는 확산 기반 프레임워크를 제안하여 적외선과 가시 원천의 정보를 더 잘 융합한다.
색 공간 변환 없이도 질감과 색상을 보존하면서 3-채널 융합 이미지를 직접 생성한다.
융합 결과의 색상 충실도를 정량화하기 위한 새로운 평가 지표를 도입한다.

제안 방법

적외선(1채널)과 가시(3채널)를 연결해 4채널 입력을 형성하고 확산 프로세스로 그 공동 분포를 모델링한다.
정방향 확산을 사용해 점진적으로 가우시안 노이즈를 추가하고 역 확산 네트워크로 잡음 제거 및 다중 채널 잠재 구조를 학습한다.
다중 확산 단계에 걸쳐 잡음 제거 네트워크에서 다중 채널 확산 특징을 추출해 적외선 및 가시 정보를 포착한다.
다중 채널 융합 모듈을 통해 확산 특징을 융합하고 3-채널 융합 이미지를 출력한다.
3-채널 출력에서 질감과 강도 보존을 이끌기 위해 다중 채널 그래디언트 손실(LMCG) 및 다중 채널 강도 손실(LMCI)을 도입한다.

실험 결과

연구 질문

RQ1확산 모델을 어떻게 사용하여 이미지 융합을 위한 다중 채널 적외선 및 가시 데이터의 분포를 구성할 수 있는가?
RQ2확산 기반 특징이 색 공간 변환 없이 고색상 충실도의 3-채널 융합 이미지를 직접 생성하게 할 수 있는가?
RQ3다중 채널 융합 출력에서 질감, 그래디언트 및 강도를 보존하는 데 효과적인 손실은 무엇인가?
RQ4제안된 접근 방식이 표준 적외선-가시 융합 데이터셋에서 최첨단 방법과 비교하여 어떤 성능을 보이는가?

주요 결과

본 방법은 공개 데이터셋에서 다수의 최첨단 방법들보다 색상 충실도가 개선되고 질감과 강도 보존이 더 우수한 융합 이미지를 산출한다.
확산 기반 프레임워크가 색 공간 변환 없이 직접 3-채널 융합 이미지를 생산할 수 있다.
새로운 다중 채널 그래디언트 손실과 다중 채널 강도 손실이 융합을 색상 정확하고 디테일이 풍부한 출력으로 안내한다.
정성적 및 정량적 분석은 색상 보존 및 지각 품질에서의 이점을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.