QUICK REVIEW

[论文解读] Pre-Trained Image Processing Transformer

Hanting Chen, Yunhe Wang|arXiv (Cornell University)|Dec 1, 2020

Advanced Image Processing Techniques参考文献 89被引用 120

一句话总结

IPT 是一个在大型合成图像处理语料库（ImageNet 派生）上预训练的基于 Transformer 的模型，能够处理多种低级视觉任务，如超分辨率、去噪和去雨，微调后实现强性能。

ABSTRACT

As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at https://github.com/huawei-noah/Pretrained-IPT and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/IPT

研究动机与目标

动机：在低级视觉任务中凸显跨任务预训练的必要性（数据稀缺且任务类型多变的场景）。
提出一个通用的预训练 Transformer（IPT），具备面向具体任务的头/尾以及用于图像处理任务的共享主体。
利用对 ImageNet 图像进行大规模合成降级，生成多样化的训练数据。
结合对比学习以增强块级表示并提高对未见任务的泛化能力。
证明经在 SR、去噪和去雨上的微调后，单个预训练 IPT 能超越任务特定模型。

提出的方法

介绍一个四组件的 IPT：为每个任务设置多任务头、一个共享的编码-解码 Transformer 主体，以及用于重建的多尾输出。
将输入特征转换为带有位置编码的补丁（可视词），通过 Transformer 编码器处理，并使用带嵌入的任务感知解码器。
在一个大规模合成的、派生自 ImageNet 的数据集上对 IPT 进行预训练，包含多种降级模型（双三次 SR、高斯噪声、雨等）。
使用重建的监督损失加上同一图像内补丁之间的对比损失来学习通用特征（L_IPT = λ L_contrastive + L_supervised）。
通过在特定任务上微调对预训练 IPT（例如 ×2/×3/×4 SR、去噪、去雨）来实现，在需要时冻结未使用的头/尾。

实验结果

研究问题

RQ1一个经过预训练的单一 Transformer 模型在微调后是否能够在多种低级图像处理任务之间实现泛化？
RQ2在降级后的 ImageNet 数据上进行大规模预训练是否能在 SR、去噪和去雨方面优于任务特定模型？
RQ3对 IPT 的质量和跨任务泛化，对比学习有哪些影响？
RQ4在 ImageNet 预训练并微调后，IPT 与最先进的基于 CNN 的方法相比如何？
RQ5多任务训练与单任务预训练对迁移到新任务的效果有何差异？

主要发现

方法	尺度	Set5	Set14	B100	Urban100
IPT (ours)	×2	38.37	34.43	32.48	33.76
IPT (ours)	×3	34.81	30.85	29.38	29.49
IPT (ours)	×4	32.64	29.01	27.82	27.26

在多项低级基准测试上，IPT 在微调后优于许多任务特定方法。
在 SR 上，IPT 在 ×2 下达到 PSNR 值：38.37 (Set5)，34.43 (Set14)，32.48 (B100)，33.76 (Urban100)；在 ×3 下为 34.81、30.85、29.38、29.49；在 ×4 下为 32.64、29.01、27.82、27.26。
在彩色图像去噪（高斯噪声）方面，IPT 获得 30.75 (BSD68, σ=30) 与 28.39 (Urban100, σ=50)；这两个数值均为公开报道中的最好之一。
在去雨方面，IPT 在 Rain100L 上达到 41.62 dB PSNR，超过了前代方法。
对比学习（λ > 0）在与监督损失结合时，能在 SR 上将 PSNR 提升最多约 ~0.1 dB。
与单任务预训练相比，多任务预训练提升了对未见任务的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。