QUICK REVIEW

[논문 리뷰] SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang, Jiezhang Cao|arXiv (Cornell University)|2021. 08. 23.

Advanced Image Processing Techniques참고 문헌 90인용 수 80

한 줄 요약

SwinIR은 Swin Transformer–based 아키텍처와 얕은/깊은 특징 추출 및 잔류 Swin Transformer 블록을 통해 매개변수가 더 적은 상태에서 SR, denoising, 및 JPEG artifact reduction에 대해 경쟁력 있는 이미지 복원을 달성합니다.

ABSTRACT

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $ extbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $ extbf{up to 67%}$.

연구 동기 및 목표

Transformer 기반 모델의 이미지 복원에 대한 효과를 고무하고 입증합니다.
SwinIR, 고품질 이미지 복원 작업을 위한 Swin Transformer–based 아키텍처를 제안합니다.
SwinIR이 더 적은 매개변수로 최첨단 CNN 기반 방법을 능가할 수 있음을 보여줍니다.

제안 방법

SwinIR의 세 모듈 구성을 제안합니다: 얕은 특징 추출, 깊은 특징 추출, 고품질 이미지 재구성.
깊은 특징은 K 잔류 Swin Transformer 블록을 통해 추출되며, 각각은 잔류 컨볼루션 경로를 가진 여러 Swin Transformer 계층을 포함합니다.
저수준 특징과 심층 특징을 재구성 모듈에서 융합하여 고품질 이미지를 생성하고 낮은 주파수 정보를 보존하기 위한 스킵 연결을 사용합니다.
깊은 특징 추출 후 3x3 컨볼루션을 사용하여 특징 융합 전에 귀납적 편향(inductive bias)을 도입합니다.
L1 손실을 클래식 SR 및 실제 세계 SR에 대해 최적화하고, 잡음 제거와 JPEG 아티팩트 감소에는 Charbonnier 손실을 사용하며, 필요에 따라 실제 세계 SR에 대해서 GAN/인식적 손실을 사용할 수 있습니다.

실험 결과

연구 질문

RQ1Swin Transformer–based 아키텍처가 Classical, Real-world, 및 경량 이미지 SR에서 CNN 기반 방법을 능가할 수 있는가?
RQ2잔류 Swin Transformer 블록이 고주파 세부 정보를 복원하면서 모델의 효율성을 유지하는 데 효과적인가?
RQ3SwinIR은 이미지 노이즈 제거 및 JPEG 아티팩트 감소에서 최신 방법들과 비교하여 어떤 성능을 보이는가?
RQ4저항 연결(residual connections) 및 컨볼루션 엔드 블록과 같은 아키텍처 선택이 복원 성능에 어떤 영향을 미치는가?

주요 결과

SwinIR은 다수의 SR 데이터셋에서 PSNR/SSIM이 최첨단이거나 경쟁적이며, 매개변수 수가 많은 CNN 기반Methods보다 적습니다.
더 큰 열화 모델과 데이터셋으로 학습될 때 SwnIR은 실제 세계 SR에서 강력한 성능을 보여 IPT를 특정 조건에서 능가합니다.
ablation에서 RSTB의 잔류 연결은 PSNR을 크게 향상시키고, 3x3 컨볼루션은 특징 강화에 대해 1x1 또는 다수의 작은 컨볼루션보다 우수합니다.
Swin Transformer–based 접근법은 수렴성과 데이터 효율이 좋으며 DIV2K 및 DIV2K+Flickr2K 학습 데이터로도 잘 작동합니다.
노이즈 제거의 경우 SwinIR은 다수의 데이터셋 및 노이즈 수준에서 전통적인 방법 및 여러 CNN 기반 방법보다 우수한 성능을 보이며, 일부 기준선보다 매개변수가 적습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.