[论文解读] SwinIR: Image Restoration Using Swin Transformer
SwinIR 使用基于 Swin Transformer 的架构,结合浅层/深层特征提取和残差 Swin Transformer 模块,在超分辨率、去噪和 JPEG 压缩伪影降低等图像恢复任务中实现具竞争力的性能,并且参数更少。
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $ extbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $ extbf{up to 67%}$.
研究动机与目标
- Motivate and demonstrate the effectiveness of Transformer-based models for image restoration.
- Propose SwinIR, a Swin Transformer–based architecture for high-quality image restoration tasks.
- Show that SwinIR can outperform state-of-the-art CNN-based methods with fewer parameters.
提出的方法
- 提出三模块的 SwinIR:浅层特征提取、深层特征提取和高质量图像重建。
- 通过若干 Swin Transformer 层组成的若干残差卷积路径,在每个残差 Swin Transformer 块中提取深层特征。
- 在重建模块中融合浅层与深层特征以生成功高质量图像,并通过跳跃连接保留低频信息。
- 在深度特征提取后使用 3x3 卷积以在特征融合前引入归纳偏置。
- 以 L1 损失对经典 SR 和真实场景 SR 进行优化,对去噪与 JPEG 伪影降低采用 Charbonnier 损失,且在需要时可对真实场景 SR 使用 GAN/感知损失。
实验结果
研究问题
- RQ1Swin Transformer‑based 架构是否在经典、真实场景和轻量级图像 SR 上优于基于 CNN 的方法?
- RQ2残差 Swin Transformer 块是否在保持模型效率的同时有效恢复高频细节?
- RQ3SwinIR 在去噪和 JPEG 伪影降低方面相对于最先进方法的表现如何?
- RQ4结构选择如残差连接和卷积端块对恢复性能有何影响?
主要发现
- SwinIR 在多个 SR 数据集上达到最先进或具竞争力的 PSNR/SSIM,且参数少于许多基于 CNN 的方法。
- 当使用更大规模的降解模型和数据集进行训练时,SwinIR 在真实世界 SR 上表现出强劲的性能,在某些条件下甚至优于 IPT。
- 在消融实验中,RSTB 中的残差连接显著提升 PSNR,且 3x3 卷积在特征增强方面优于 1x1 或多次小卷积。
- 基于 Swin Transformer 的方法呈现良好的收敛性和数据效率,在使用 DIV2K 与 DIV2K+Flickr2K 的训练数据时表现良好。
- 在去噪方面,SwinIR 在多个数据集和噪声水平上优于传统方法和若干基于 CNN 的方法,且参数少于某些基线方法。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。