QUICK REVIEW

[논문 리뷰] Unsupervised Deep Multi-focus Image Fusion

Xiang Yan, Syed Zulqarnain Gilani|arXiv (Cornell University)|2018. 06. 19.

Advanced Image Fusion Techniques참고 문헌 7인용 수 48

한 줄 요약

MFNet를 제안하는 엔드-투-엔드 비감독 CNN으로, SSIM 기반 손실 사용하여 다초점 이미지 쌍을 직접 하나의 완전 초점 이미지로 융합하며, 실제 벤치마크 데이터에서 ground-truth 융합 이미지 없이 학습된다.

ABSTRACT

Convolutional neural networks have recently been used for multi-focus image fusion. However, due to the lack of labeled data for supervised training of such networks, existing methods have resorted to adding Gaussian blur in focused images to simulate defocus and generate synthetic training data with ground-truth for supervised learning. Moreover, they classify pixels as focused or defocused and leverage the results to construct the fusion weight maps which then necessitates a series of post-processing steps. In this paper, we present unsupervised end-to-end learning for directly predicting the fully focused output image from multi-focus input image pairs. The proposed approach uses a novel CNN architecture trained to perform fusion without the need for ground truth fused images and exploits the image structural similarity (SSIM) to calculate the loss; a metric that is widely accepted for fused image quality evaluation. Consequently, we are able to utilize {\em real} benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluations on benchmark datasets show that our method outperforms existing state-of-the-art in terms of visual quality and objective evaluations.

연구 동기 및 목표

ground-truth data 없이 다초점 이미지 융합을 동기 부여한다.
다초점 입력에서 완전히 초점이 맞춰진 이미지를 출력하는 엔드-투-엔드 CNN을 개발한다.
융합, 특징 추출 및 재구성을 하나의 네트워크에 통합하여 후처리 제거.
합성 블러가 아닌 실제 벤치마크 데이터셋을 학습에 활용한다.
재현을 촉진하기 위해 공개적으로 이용 가능한 학습 모델을 제공한다.

제안 방법

각 입력 이미지로부터 비선형 특성을 추출하는 세 가지 특징 추출 서브네트워크.
두 입력 특징의 융합은 평균 이미지의 특징과 결합되어 재구성 서브네트워크로 입력된다.
손실은 로컬하게 융합된 출력과 입력을 비교하는 다초점 SSIM 지표를 기반으로 한다.
모든 합성곱층은 3x3 크기의 64 필터와 제로패딩을 사용한다; Leaky ReLU를 사용하지만 마지막 층은 시그모이드를 사용한다.
벼치마크 데이터셋의 60개 다초점 이미지 쌍에서 잘라낸 50,000개의 패치를 400-에포크 구조로 학습에 사용한다.

실험 결과

연구 질문

RQ1ground-truth 융합 이미지 없이 다초점 입력 쌍으로부터 모든 초점이 맞춰진 이미지를 출력하도록 엔드-투-엔드 비감독 CNN이 학습할 수 있는가?
RQ2SSIM 기반 손실이 다초점 시나리오에서 융합 품질을 효과적으로 이끄는가?
RQ3표준 벤치마크에서 MFNet이 다중 지표 기준으로 최첨단 융합 방법과 어떻게 비교되는가?
RQ4학습된 모델이 테스트 시 가변적인 이미지 크기를 처리할 수 있는가?

주요 결과

MFNet은 여러 데이터셋과 이미지 세트에서 여러 객관적 지표에서 최첨단 방법들을 능가한다.
이 방법은 경쟁 방식에 비해 경계 인공물 감소와 함께 시각적으로 아티팩트가 없는 융합 이미지를 생성한다.
MFNet은 CNN 기반 기준선보다 빠른 실행 시간을 달성하는 반면 더 높은 융합 품질을 제공한다.
완전히 합성곱 설계 덕분에 네트워크는 테스트 시 가변 크기의 입력을 지원한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.