QUICK REVIEW

[논문 리뷰] VmambaIR: Visual State Space Model for Image Restoration

Yuan Shi, Bin Xia|arXiv (Cornell University)|2024. 03. 18.

Image and Signal Denoising Methods인용 수 8

한 줄 요약

VmambaIR은 Unet 프레임워크 내에서 새로운 Omni Selective Scan을 활용하여 상태 공간 모델로 이미지 복원을 모델링하며, deraining, 단일 이미지 초해상도(SISR), 실제 세계 초해상도에서 연산량 감소와 파라미터 수 감소에도 불구하고 최첨단(SOTA) 성능을 제공합니다.

ABSTRACT

Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range dependencies. DMs require large prior models and computationally intensive denoising steps. Transformers have powerful modeling capabilities but face challenges due to quadratic complexity with input image size. To address these challenges, we propose VmambaIR, which introduces State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks. We utilize a Unet architecture to stack our proposed Omni Selective Scan (OSS) blocks, consisting of an OSS module and an Efficient Feed-Forward Network (EFFN). Our proposed omni selective scan mechanism overcomes the unidirectional modeling limitation of SSMs by efficiently modeling image information flows in all six directions. Furthermore, we conducted a comprehensive evaluation of our VmambaIR across multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that our proposed VmambaIR achieves state-of-the-art (SOTA) performance with much fewer computational resources and parameters. Our research highlights the potential of state space models as promising alternatives to the transformer and CNN architectures in serving as foundational frameworks for next-generation low-level visual tasks.

연구 동기 및 목표

긴 범위 의존성 처리와 효율성 측면에서 CNN, 트랜스포머, 확산 모델의 한계를 해결하여 이미지 복원을 개선하려는 동기 부여.
2D 이미지 데이터에 대해 선형 복잡도의 상태 공간 모델 기반 아키텍처를 개발한다.
정보 흐름을 여섯 방향으로 포착하기 위해 OSS(Omni Selective Scan) 블록을 갖춘 다중 스케일 UNet를 설계한다.
더 낮은 자원 사용으로 deraining, SR, 실제 세계 SR 작업에서 VmambaIR의 효과를 입증한다.

제안 방법

제안된 OSS 블록으로 쌓은 UNet 유사 아키텍처를 채택한다.
입력을 두 흐름으로 처리하고 특징 차원 매핑에 CNN을 사용하는 OSS 모듈을 도입한다.
계층적 정보 흐름을 조절하기 위해 Efficient Feed-Forward Network(EFFN)을 도입한다.
고주파 모델링을 위한 Mamba 블록을 사용하여 세 차원에서 양방향 스캔이 포함된 여섯 방향의 정보 흐름을 모델링하기 위해 Omni Selective Scan(OSS)를 구현한다.
이미지 특성의 효율적 시퀀스 모델링을 위해 ZOH 기반 이산화를 갖춘 이산화 상태 공간 모델(SSM)을 사용한다.

실험 결과

연구 질문

RQ1선형 복잡도 상태 공간 모델링 접근법이 Transformer/CNN 기반의 이미지 복원 성능에 필적하거나 이를 초과할 수 있는가?
RQ2Omni Selective Scan이 단방향 Mamba 블록을 넘어 이미지에서 포괄적이고 다방향의 정보 흐름 모델링을 가능하게 하는가?
RQ3OSS, 양방향 채널 스캐닝, 그리고 EFFN이 SR, 실제 세계 SR, 및 deraining 작업에서 복원 정확도와 효율성에 어떤 기여를 하는가?

주요 결과

VmambaIR은 이미지 deraining, 단일 이미지 초해상도, 실제 세계 이미지 초해상도 등 이미지 복원 작업에서 최첨단 성능을 달성한다.
실세계 4× 초해상도에서 VmambaIR은 기준 방법의 약 26%의 계산 비용으로 더 높은 재구성 정확도를 제공합니다.
분석 연구는 OSS가 단방향 스캐닝에 비해 성능을 크게 향상시키고, 양방향 채널 스캐닝이 정확도를 높이며, EFFN이 정보 흐름과 효율성을 향상시킴을 보여준다.
VmambaIR은 여러 벤치마크에서 기존 SOTA 방법에 비해 더 적은 파라미터와 FLOPs로 더 나은 고주파 디테일을 제공한다.
정성적 결과는 더 적은 아티팩트와 함께 (예: 얼굴의 눈/코, 수면 등)의 더 미세한 디테일 보존을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.