QUICK REVIEW

[论文解读] SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang, Jiezhang Cao|arXiv (Cornell University)|Aug 23, 2021

Advanced Image Processing Techniques参考文献 90被引用 80

一句话总结

SwinIR 使用基于 Swin Transformer 的架构，结合浅层/深层特征提取和残差 Swin Transformer 模块，在超分辨率、去噪和 JPEG 压缩伪影降低等图像恢复任务中实现具竞争力的性能，并且参数更少。

ABSTRACT

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $ extbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $ extbf{up to 67%}$.

研究动机与目标

Motivate and demonstrate the effectiveness of Transformer-based models for image restoration.
Propose SwinIR, a Swin Transformer–based architecture for high-quality image restoration tasks.
Show that SwinIR can outperform state-of-the-art CNN-based methods with fewer parameters.

提出的方法

提出三模块的 SwinIR：浅层特征提取、深层特征提取和高质量图像重建。
通过若干 Swin Transformer 层组成的若干残差卷积路径，在每个残差 Swin Transformer 块中提取深层特征。
在重建模块中融合浅层与深层特征以生成功高质量图像，并通过跳跃连接保留低频信息。
在深度特征提取后使用 3x3 卷积以在特征融合前引入归纳偏置。
以 L1 损失对经典 SR 和真实场景 SR 进行优化，对去噪与 JPEG 伪影降低采用 Charbonnier 损失，且在需要时可对真实场景 SR 使用 GAN/感知损失。

实验结果

研究问题

RQ1Swin Transformer‑based 架构是否在经典、真实场景和轻量级图像 SR 上优于基于 CNN 的方法？
RQ2残差 Swin Transformer 块是否在保持模型效率的同时有效恢复高频细节？
RQ3SwinIR 在去噪和 JPEG 伪影降低方面相对于最先进方法的表现如何？
RQ4结构选择如残差连接和卷积端块对恢复性能有何影响？

主要发现

SwinIR 在多个 SR 数据集上达到最先进或具竞争力的 PSNR/SSIM，且参数少于许多基于 CNN 的方法。
当使用更大规模的降解模型和数据集进行训练时，SwinIR 在真实世界 SR 上表现出强劲的性能，在某些条件下甚至优于 IPT。
在消融实验中，RSTB 中的残差连接显著提升 PSNR，且 3x3 卷积在特征增强方面优于 1x1 或多次小卷积。
基于 Swin Transformer 的方法呈现良好的收敛性和数据效率，在使用 DIV2K 与 DIV2K+Flickr2K 的训练数据时表现良好。
在去噪方面，SwinIR 在多个数据集和噪声水平上优于传统方法和若干基于 CNN 的方法，且参数少于某些基线方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。