QUICK REVIEW

[论文解读] SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models

Haoying Li, Yifan Yang|arXiv (Cornell University)|Apr 30, 2021

Advanced Image Processing Techniques参考文献 24被引用 27

一句话总结

SRDiff 是首个基于扩散的单图像超分辨率模型，在低分辨率输入条件下能够生成多样化且高质量的超分辨输出，计算开销小且训练稳定。

ABSTRACT

Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from the given low-resolution (LR) ones, which is an ill-posed problem because one LR image corresponds to multiple HR images. Recently, learning-based SISR methods have greatly outperformed traditional ones, while suffering from over-smoothing, mode collapse or large model footprint issues for PSNR-oriented, GAN-driven and flow-based methods respectively. To solve these problems, we propose a novel single image super-resolution diffusion probabilistic model (SRDiff), which is the first diffusion-based model for SISR. SRDiff is optimized with a variant of the variational bound on the data likelihood and can provide diverse and realistic SR predictions by gradually transforming the Gaussian noise into a super-resolution (SR) image conditioned on an LR input through a Markov chain. In addition, we introduce residual prediction to the whole framework to speed up convergence. Our extensive experiments on facial and general benchmarks (CelebA and DIV2K datasets) show that 1) SRDiff can generate diverse SR results in rich details with state-of-the-art performance, given only one LR input; 2) SRDiff is easy to train with a small footprint; and 3) SRDiff can perform flexible image manipulation including latent space interpolation and content fusion.

研究动机与目标

通过避免过度平滑和模式崩溃来解决病态的 SISR 问题。
从单个 LR 输入实现多样化、真实感强的 SR 输出。
在没有对抗性或基于流的约束的情况下实现稳定、轻量级的训练。
支持潜在空间操作和内容融合，以实现灵活的 SR 应用。

提出的方法

使用扩散概率模型将高斯噪声映射到在 LR 输入条件下的 SR 图像。
引入预训练的 LR 编码器以从 LR 图像中提取条件信息。
通过对 HR 图像与上采样后的 LR 图像之间的差值进行建模来实现残差预测以加速收敛。
用数据似然性的变分下界（ELBO）的变体对优化，使用噪声预测器 εθ。
采用基于 U-Net 的条件噪声预测器，集成 RRDB 基 LR 编码器。
通过从高斯 xT 逐步去噪到 x0 来推断，然后将上采样的 LR 图像加入以形成 SR 输出。

实验结果

研究问题

RQ1扩散模型是否可有效用于 SISR，从单个 LR 输入产生多样化且高质量的 SR 解决方案？
RQ2在扩散式 SISR 中引入残差预测是否能提高训练稳定性和推理速度？
RQ3与以 PSNR 为导向的、GAN 基和基于流的 SR 方法相比，在模型大小、训练时间和性能方面有哪些权衡？
RQ4SRDiff 是否支持在 SR 设置中进行潜在空间插值和内容融合这类灵活的图像操作？

主要发现

Methods	PSNR	SSIM	LPIPS	LR-PSNR	sigma
Bicubic	23.38	0.65	0.484	34.66	0.00
RRDB	26.89	0.78	0.220	48.01	0.00
ESRGAN	23.24	0.66	0.115	39.91	0.00
ProgFSR	24.21	0.69	0.126	42.19	0.00
SRFlow	25.32	0.72	0.108	50.73	5.21
SRDiff	25.38	0.74	0.106	52.34	6.13
ProgFSR	24.21	0.69	0.126	42.19	0.00
SRDiff	25.32	0.73	0.106	51.41	6.19
Bicubic	26.70	0.77	0.409	38.70	0.00
EDS R	28.98	0.83	0.270	54.89	0.00
RRDB	29.44	0.84	0.253	49.20	0.00
RankSRGAN	26.55	0.75	0.128	42.33	0.00
ESRGAN	26.22	0.75	0.124	39.03	0.00
SRFlow	27.09	0.76	0.120	49.96	5.14
SRDiff	27.41	0.79	0.136	55.21	6.09

SRDiff 在保持 LR 一致性的同时实现多样化、高质量的 SR 输出，并在 CelebA（8×）和 DIV2K（4×）上超越若干最先进的方法。
SRDiff 约使用 1200 万参数，在单个 GPU 上大致需要约 30 小时即可收敛，且比 SRFlow（约 4000 万）占用的显存更小。
残差预测加速收敛并提升 SR 质量，消融实验在扩散步骤和模型宽度上均显示出好处。
SRDiff 支持潜在空间插值和内容融合，实现对 SR 图像的灵活操作。
与基于 GAN 的方法相比，SRDiff 避免了判别器训练和伪影；与基于流的方法相比，SRDiff 对结构约束更少且保持轻量级。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。