QUICK REVIEW

[논문 리뷰] CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

Amirhosein Ghasemabadi, Muhammad Kamran Janjua|arXiv (Cornell University)|2024. 01. 26.

Image Retrieval and Classification Techniques인용 수 9

한 줄 요약

CGNet은 자체 어텐션 없이도 글로벌 컨텍스트를 포착하는 Global Context Extractor(GCE)를 갖춘 완전 합성곱 인코더-디코더를 도입하여, denoising 및 deblurring 작업에서 연산 비용이 낮아지면서 최첨단 성능을 달성한다.

ABSTRACT

Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the challenge of balancing performance and computational cost via Transformer variants. In this paper, we present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE), a novel and efficient way to capture global information for image restoration. The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention. Extensive experimental results show that our computationally efficient approach performs competitively to a range of state-of-the-art methods on synthetic image denoising and single image deblurring tasks, and pushes the performance boundary further on the real image denoising task.

연구 동기 및 목표

무거운 자체 어텐션 오버헤드 없이 글로벌 컨텍스트를 포착함으로써 효율적인 복원을 고무한다.
지역 및 글로벌 의존성을 학습하기 위한 Cascaded Fully Convolutional 모듈(GCE)을 제안한다.
복원 작업을 위한 로컬/글로벌 특징을 병합하기 위해 Range Fuser를 통합한다.
denoising과 deblurring 벤치마크에서 더 낮은 MACs와 더 빠른 추론으로 최첨단 성능을 보여준다.

제안 방법

인코더-디코더 U-Net 백본을 가진 CascadedGaze Network (CGNet)를 도입한다.
자체 어텐션 없이 로컬, 중간, 글로벌 컨텍스트를 학습하기 위해 최대 세 개의 작은 커널 컨볼루션을 사용한 Global Context Extractor (GCE)를 개발한다.
Range Fuser를 사용하여 로컬/글로벌 특징을 업샘플링하고 연결하며 Simple Channel Attention(SCA) 및 포인트와이즈 컨볼루션을 통해 재가중치를 부여한다.
GCE 이전에 채널 머징(StaticMerge 선호)을 도입하여 계산 부하를 줄인다.
실제 및 합성 denoising 데이터셋(SIDD, BSD68, Urban100, Kodak24, McMaster)과 GoPro deblurring에서 표준 PSNR 손실 및 SGD 옵티마이저를 사용하여 여러 패치/크기에 대해 엔드투엔드로 학습한다.

Figure 1 : Computational Efficiency vs Performance. Left: PSNR vs. MACs (G) comparison on SIDD real image denoising. Right: PSNR vs. MACs (G) comparison on Gaussian image denoising tested on Kodak24 dataset with noise level $\sigma=50$ . Our model achieves state-of-the-art results and is computation

실험 결과

연구 질문

RQ1CGNet이 MACs 및 추론 시간를 줄이면서 최첨단 복원 방법들을 능가할 수 있는가?
RQ2글로벌 컨텍스트를 포착하는 데 있어 GCE 모듈이 self-attention과 어떻게 비교되는가?
RQ3최고의 성능-효율 트레이드오프를 위해 GCE를 네트워크의 어디에 배치해야 하는가?
RQ4GCE 이전의 채널 머징이 더 낮은 계산에서 성능을 유지하는가?
RQ5CGNet이 실제/노이즈, 가우시안, 모션 디블러링 복원 작업에서 견고한가?

주요 결과

CGNet은 실제 이미지 노이즈 제거(SIDD)에서 NAFNet 대비 PSNR이 0.09 dB 향상된 것으로 CGNet이 선행 방법을 능가한다.
가우시안 노이즈 제거에서 CGNet은 최첨단 또는 비교 가능한 성능을 보이며, 많은 기초 방법들보다 빠르고 MACs가 더 낮다; McMaster를 제외한 모든 데이터셋에서 Restormer보다 우수하다.
단일 이미지 모션 디블러링(GoPro)에서 CGNet은 NAFNet 변형들보다 최대 0.06 dB 더 높은 PSNR을 달성한다.
CGNet은 여러 데이터셋에서 MACs 및 추론 시간이 크게 작아도 PSNR/SSIM이 경쟁적이거나 우수하다.
시각화는 GCE의 로컬 컨텍스트가 전경 경계를 포착하는 반면 글로벌 컨텍스트는 더 넓은 이미지 구조를 모델링하여 보완적 역할을 시사한다.

Figure 2(()): (a) Illustration of the overall architecture of CascadedGaze network (CGNet). Each encoder layer comprises $N_{g}\times$ CascadedGaze blocks. (b) The CascadedGaze blocks are composed of (c) GCE module and (d) Range Fuser. GCE Module has three depthwise convolutions, followed by pointwi

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.