Skip to main content
QUICK REVIEW

[论文解读] SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang, Jiezhang Cao|arXiv (Cornell University)|Aug 23, 2021
Advanced Image Processing Techniques参考文献 90被引用 80
一句话总结

SwinIR 使用基于 Swin Transformer 的架构,结合浅层/深层特征提取和残差 Swin Transformer 模块,在超分辨率、去噪和 JPEG 压缩伪影降低等图像恢复任务中实现具竞争力的性能,并且参数更少。

ABSTRACT

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $ extbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $ extbf{up to 67%}$.

研究动机与目标

  • Motivate and demonstrate the effectiveness of Transformer-based models for image restoration.
  • Propose SwinIR, a Swin Transformer–based architecture for high-quality image restoration tasks.
  • Show that SwinIR can outperform state-of-the-art CNN-based methods with fewer parameters.

提出的方法

  • 提出三模块的 SwinIR:浅层特征提取、深层特征提取和高质量图像重建。
  • 通过若干 Swin Transformer 层组成的若干残差卷积路径,在每个残差 Swin Transformer 块中提取深层特征。
  • 在重建模块中融合浅层与深层特征以生成功高质量图像,并通过跳跃连接保留低频信息。
  • 在深度特征提取后使用 3x3 卷积以在特征融合前引入归纳偏置。
  • 以 L1 损失对经典 SR 和真实场景 SR 进行优化,对去噪与 JPEG 伪影降低采用 Charbonnier 损失,且在需要时可对真实场景 SR 使用 GAN/感知损失。

实验结果

研究问题

  • RQ1Swin Transformer‑based 架构是否在经典、真实场景和轻量级图像 SR 上优于基于 CNN 的方法?
  • RQ2残差 Swin Transformer 块是否在保持模型效率的同时有效恢复高频细节?
  • RQ3SwinIR 在去噪和 JPEG 伪影降低方面相对于最先进方法的表现如何?
  • RQ4结构选择如残差连接和卷积端块对恢复性能有何影响?

主要发现

  • SwinIR 在多个 SR 数据集上达到最先进或具竞争力的 PSNR/SSIM,且参数少于许多基于 CNN 的方法。
  • 当使用更大规模的降解模型和数据集进行训练时,SwinIR 在真实世界 SR 上表现出强劲的性能,在某些条件下甚至优于 IPT。
  • 在消融实验中,RSTB 中的残差连接显著提升 PSNR,且 3x3 卷积在特征增强方面优于 1x1 或多次小卷积。
  • 基于 Swin Transformer 的方法呈现良好的收敛性和数据效率,在使用 DIV2K 与 DIV2K+Flickr2K 的训练数据时表现良好。
  • 在去噪方面,SwinIR 在多个数据集和噪声水平上优于传统方法和若干基于 CNN 的方法,且参数少于某些基线方法。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。