QUICK REVIEW

[论文解读] Restormer: Efficient Transformer for High-Resolution Image Restoration

Syed Waqas Zamir, Aditya Arora|arXiv (Cornell University)|Nov 18, 2021

Advanced Image Processing Techniques参考文献 99被引用 191

一句话总结

Restormer 引入了一种轻量级 Transformer，具备多-Dconv 头转置注意力和门控-Dconv 前馈网络，能够以线性复杂度实现高分辨率图像恢复，在多项任务上达到最先进的结果。

ABSTRACT

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising). The source code and pre-trained models are available at https://github.com/swz30/Restormer.

研究动机与目标

将图像恢复动机化为一个需要强先验和长程依赖的病态问题。
克服标准自注意力的二次复杂度，以实现高分辨率恢复。
提出带有新型模块（MDTA 和 GDFN）以及用于多尺度上下文学习的渐进式训练策略的 Restormer。

提出的方法

提出一个不将高分辨率图像分割为局部窗口的编码-解码器架构。
用多-Dconv 头转置注意力（MDTA）替换 vanilla 多头自注意力，该机制以线性复杂度计算跨通道协方差，并通过 1x1 和深度卷积将局部上下文融入其中。
提出门控-Dconv 前馈网络（GDFN），利用门控机制和深度卷积来控制并丰富特征变换。
使用渐进学习策略：从小补丁和大批量开始训练，逐步转向较大补丁和较小批量，以捕捉全局图像统计信息。
为去雨、去模糊、去散焦去模糊（单图像与双像素）和去噪训练任务特定的 Restormer 模型，确保参数数量和 FLOPs 尽可能小。

实验结果

研究问题

RQ1Restormer 是否能够以线性复杂度建模全局像素交互，适用于高分辨率图像恢复？
RQ2提出的 MDTA 和 GDFN 组件与传统的注意力和前馈网络在恢复任务中的表现有何差异？
RQ3渐进学习是否在多种恢复任务的全分辨图像上提升了性能？
RQ4Restormer 在去雨、运动去模糊、散焦去模糊和去噪数据集上的最先进性能是多少？

主要发现

Restormer 在图像去雨、单张图像运动去模糊、散焦去模糊（单图像和双像素）以及图像去噪上，在多个数据集上达到最先进的结果。
平均而言，Restormer 在五个 Rain 数据集上比之前的最佳去雨方法高出 1.05 dB。
在运动去模糊方面，Restormer 相较于 MIMO-UNet+ 平均提高 PSNR/SSIM 0.47 dB，相较于 MPRNet 提高 0.26 dB，同时 FLOPs 比 MPRNet 少 81%，参数量比 IPT 少 4.4 倍，运行时快 29 倍。
在高斯灰度/彩色去噪和真实图像去噪方面，Restormer 与领先的CNN/Transformer方法相当或更优，并在 SIDD/DND 基准上实现了真实图像去噪的更高 PSNR。
Restormer 显示出强泛化能力：在 GoPro 上用于去模糊的训练，同时在其他数据集上达到最先进的性能。
消融研究表明 MDTA 与 GDFN 的组合在高分辨率城市数据集上提供了最佳 PSNR，验证了设计选择。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。