QUICK REVIEW

[논문 리뷰] Uformer: A General U-Shaped Transformer for Image Restoration

Zhendong Wang, Xiaodong Cun|arXiv (Cornell University)|2021. 06. 06.

Advanced Image Processing Techniques참고 문헌 76인용 수 109

한 줄 요약

Uformer는 Locally-Enhanced Window (LeWin) 블록과 경량 다중 스케일 복원 모듈레이터를 갖춘 U자형 트랜스포머를 도입하여 효율적인 계산으로 덴노이즈, 디블러링, 디포커스 디블러링, 및 디레인링에서 최첨단 결과를 달성합니다.

ABSTRACT

In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs nonoverlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

연구 동기 및 목표

전통적인 ConvNets를 넘어 이미지 복원에서 효과적인 장거리 의존성 모델링의 필요성을 제시한다.
다중 스케일 이미지 복원 작업에 적합한 일반적인 U자형 트랜스포머 아키텍처를 제안한다.
로컬 디테일과 글로벌 맥락의 균형을 맞추기 위해 효율적인 LeWin 트랜스포머 블록을 개발한다.
다중 스케일에 걸친 디테일 회복을 향상시키기 위한 경량 다중 스케일 복원 모듈레이터를 도입한다.
덴노이징, 디블러링, 디포커스 블러, 디레인 데이터셋에서 최첨단 또는 경쟁력 있는 성능을 시연한다.

제안 방법

합성곱을 LeWin 트랜스포머 블록으로 대체한 스킵 연결이 있는 계층적 UNet 유사 인코더–디코더를 제안한다.
Locally-Enhanced Window(LeWin) 트랜스포머 블록으로 비중첩 윈도우 기반 자체 주의(W-MSA)와 깊이별 합성곱이 있는 로컬리-향상 피드포워드 네트워크(LeFF)를 결합한다.
자체 주의에 대해 비중첩 MxM 윈도우를 사용하여 복잡도를 O(H^2W^2C)에서 O(M^2HW C)로 감소시킨다.
다중 스케일 복원을 위해 학습 가능한 윈도우 기반 바이어스 형태의 모듈레이터를 디코더 특징에 추가하여 스케일 간 복원에 맞게 표현을 적응시킨다.

실험 결과

연구 질문

RQ1트랜스포머 기반의 U자형 아키텍처가 로컬 윈도우 기반 자기 주의와 로컬 컨텍스트 FFN을 통해 이미지 복원에서 로컬 디테일과 장거리 의존성을 효과적으로 포착할 수 있는가?
RQ2가벼운 다중 스케일 복원 모듈레이터가 다양한 열화 유형에 대해 계산 복잡도 없이 복원 품질을 향상시키는가?
RQ3LeWin 블록의 성능과 효율성은 denoising, deblurring, deraining 과제에서 전통적인 CNN 또는 글로벌 주의 트랜스포머와 비교하여 어떤 trade-offs가 있는가?

주요 결과

Uformer-B는 SIDD에서 39.89 dB PSNR, DND에서 39.98 dB PSNR를 달성하며 이 두 실제 노이즈 데이터셋에서 이전 최첨단을 능가한다.
모션 디블러링에서도 GoPro, RealBlur-R/J, HIDE 데이터셋에서 최첨단 또는 경쟁력 있는 결과를 달성한다.
디포커스 블러에서 DPD에서 이전 방법보다 최대 1.87 dB PSNR 향상과 SSIM 향상을 보인다.
실제 비 오는 제거(SPAD)에서 Uformer-B는 47.84 dB PSNR과 0.9925 SSIM를 달성하며 PSNR에서 기존 최저치를 3.74 dB 향상시키는 등 개선을 보인다.
고찰 결과, LeWin 블록은 일반 UNet 변종을 능가하고 로컬리티 강화 FFN이 성능 향상에 기여하며 모듈레이터가 추가 이득을 제공한다(특히 SPAD에서 두드러짐).
제안된 모듈레이터는 디블러링에서 0.46 dB의 유의미한 개선과 denoising 및 deraining 작업에서 이득을 준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.