Skip to main content
QUICK REVIEW

[论文解读] Amortised MAP Inference for Image Super-resolution

Casper Kaae Sønderby, J. A. Caballero|arXiv (Cornell University)|Oct 14, 2016
Advanced Image Processing Techniques被引用 155
一句话总结

本文通过对下采样算子实现仿射一致性,并探索基于GAN、去噪引导和基于密度的方法来近似MAP解,提出摊销式MAP推理用于单图像超分辨率。

ABSTRACT

Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

研究动机与目标

  • 为SR引入MAP推理,产生可信、具有高概率的高分辨率图像,而不是来自基于MSE训练的模糊结果。
  • 提出一种神经体系结构,将输出投影到有效SR解的仿射子空间,以确保LR–HR的一致性。
  • 开发并比较三种摊销式MAP推理方法(基于GAN的、去噪引导的、基于密度模型的)用于SR。
  • 证明基于GAN的AffGAN方法在真实图像上能够产生视觉上清晰、可信的SR结果。

提出的方法

  • 引入一个仿射投影层,通过下采样算子 A 及其摩尔—彭若斯逆 A+,实现与LR输入的一致性。
  • 将摊销式MAP推理表述为最小化模型输出分布 qθ 与高分辨率图像先验 pY 之间的交叉熵。
  • 提出 AffGAN,一种以仿射投影作为生成器的GAN,训练以最小化 KL[qθ∥pY]。
  • 提出 AffDG,一种去噪引导的变体,它将贝叶斯最优去噪器的梯度估计反向传播以更新 θ。
  • 提出 AffLL,一种密度引导的变体,使用 PixelCNN 风格的密度模型 (MCGSM) 来引导与 pY 的交叉熵。
  • 讨论实例噪声作为GAN训练的稳定性技巧,并将随机的 AffGAN 变体与摊销式变分推理联系起来。

实验结果

研究问题

  • RQ1通过将输出约束在与LR输入一致的仿射子空间内,是否能有效学习图像SR的摊销式MAP推理?
  • RQ2在提出的策略中(AffGAN、AffDG、AffLL)哪一个能最好地最小化 H[qθ, pY],并产生知觉上可信的SR结果?
  • RQ3与传统的基于MSE的训练相比,强制实现仿射一致性如何影响SR的精度和真实感?
  • RQ4在此设定下,基于GAN的SR与变分/推理框架之间的关系是什么?

主要发现

  • 仿射投影层保证 LR→HR 一致性,且 AFf 投影在实验中将下采样误差降至接近零。
  • AffGAN(基于GAN)在真实数据(如 CelebA 和自然图像)上提供最清晰、最可信的SR图像,在感知质量方面优于软约束变体。
  • AffGAN 往往产生尖锐且可信的输出,带有一些高频噪声,这是GAN-based SR的特征,而用MSE训练的模型则更模糊。
  • AffDG 和 AffLL 在某些数据集上也能产生可信的结果,但在自然图像和人脸上往往不如 AffGAN 清晰或锐利。
  • 在二维 toy MAP 演示和真实图像数据集上,AffGAN/ AffDG 方法在趋向 MAP 解的交叉熵方面优于 MSE/MAE 基线。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。