QUICK REVIEW

[论文解读] Unsupervised Deep Multi-focus Image Fusion

Xiang Yan, Syed Zulqarnain Gilani|arXiv (Cornell University)|Jun 19, 2018

Advanced Image Fusion Techniques参考文献 7被引用 48

一句话总结

提出 MFNet，一种端到端的无监督卷积神经网络，直接将多焦点图像对融合为全聚焦图像，使用基于 SSIM 的损失进行训练，在没有真实融合图像的基准数据上进行训练。

ABSTRACT

Convolutional neural networks have recently been used for multi-focus image fusion. However, due to the lack of labeled data for supervised training of such networks, existing methods have resorted to adding Gaussian blur in focused images to simulate defocus and generate synthetic training data with ground-truth for supervised learning. Moreover, they classify pixels as focused or defocused and leverage the results to construct the fusion weight maps which then necessitates a series of post-processing steps. In this paper, we present unsupervised end-to-end learning for directly predicting the fully focused output image from multi-focus input image pairs. The proposed approach uses a novel CNN architecture trained to perform fusion without the need for ground truth fused images and exploits the image structural similarity (SSIM) to calculate the loss; a metric that is widely accepted for fused image quality evaluation. Consequently, we are able to utilize {\em real} benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluations on benchmark datasets show that our method outperforms existing state-of-the-art in terms of visual quality and objective evaluations.

研究动机与目标

在没有 ground-truth 数据的情况下推动多焦点图像融合。
开发一个端到端的 CNN，使其能够从多焦点输入输出一个全聚焦图像。
通过在一个网络中整合融合、特征提取和重建来消除后处理。
利用真实基准数据集进行训练，而不是合成模糊。
提供公开可用的训练模型以促进复现。

提出的方法

三个特征提取子网络从每个输入图像中提取非线性特征。
将两个输入特征进行融合，并与来自平均图像的特征结合，再送入重建子网络。
损失基于一个局部比较融合输出与输入的多焦点 SSIM 指标。
所有卷积层使用大小为 3x3、64 个滤波器、零填充；除了最后一层使用 sigmoid 外，其他使用 Leaky ReLU。
训练使用来自基准数据集的 60 对多焦点图像对裁剪的 50,000 个补丁，采用 400 次迭代的轮次结构。

实验结果

研究问题

RQ1一个端到端的无监督 CNN 是否能够在没有 ground-truth 融合图像的情况下，从一对多焦点输入中学习生成全聚焦图像？
RQ2基于 SSIM 的损失是否能有效引导多焦点场景下的融合质量？
RQ3在标准基准上，MFNet 与最先进的融合方法在多项指标上的比较如何？
RQ4在测试时，训练好的模型是否能够处理可变大小的输入？

主要发现

MFNet 在多个数据集和图像集上的若干客观指标上优于最先进的方法。
该方法生成的融合图像视觉上无伪影，边界伪影较竞争方法更少。
MFNet 的运行时间比基线 CNN 更快，同时提供更高的融合质量。
由于其全卷积设计，网络在测试时支持输入尺寸可变。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。