QUICK REVIEW

[논문 리뷰] Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Yi Xiao, Qiangqiang Yuan|arXiv (Cornell University)|2024. 05. 08.

Advanced Image Fusion Techniques인용 수 7

한 줄 요약

이 논문은 주파수 보조 Mamba 프레임워크인 FMSR를 제안합니다. 이는 Vision State Space Modeling(Mamba)과 주파수 인지 모듈을 결합하여 선형 복잡도와 향상된 PSNR으로 전역-로컬 이중 도메인 모델링을 수행합니다.

ABSTRACT

Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Considering that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively. Code will be available at https://github.com/XY-boy/FreMamba

연구 동기 및 목표

대규모 원격 탐사 이미지의 SR 작업에서 장거리 의존성 모델링을 위한 효율성 있는 동기 부여.
선형 복잡도로 전역 모델링을 위한 Mamba(Vision State Space Model) 활용.
고주파 신호를 포착하여 더 나은 재구성을 위한 주파수 인지 구성요소 도입.
학습 가능한 어댑터를 통해 전역 및 지역 표현을 다단계 융합하는 설계.

제안 방법

주파수 보조 Mamba 백본(FMG: Frequency-assisted Mamba Groups)으로 구성된 주파수 보조 Mamba 블록(FMB)을 채택합니다.
각 주파수 보조 Mamba 블록(FMB)에서 세 개의 평행 가지를 사용합니다: 글로벌 공간 모델링을 위한 Vision State Space Module(VSSM), 주파수 도메인 신호를 위한 Frequency Selection Module(FSM), 적응 융합을 위한 학습 가능한 스케일링 인자.
로컬 유도 바이어스와 고주파 정보를 모듈화하기 위한 Hybrid Gate Module(HGM)과 고주파 정보를 조절하는 Frequency Selection Module을 포함합니다.
다층 특징 융합을 개선하기 위한 cross-level 피처 재스케일링 학습 가능 어댑터를 구현합니다.
AID 데이터셋에서 L1 손실로 최적화하고 패치에서 재학습하며, AID, DOTA, DIOR 벤치마크에서 PSNR/SSIM/LPIPS로 평가합니다.

Figure 1: The Effective Receptive Field (ERF) [ 17 ] comparison for (a) CNN-based method NLSN [ 18 ] , (b) Transformer-based model RGT [ 19 ] , and the proposed Mamba-based network FMSR. A wider distribution of dark areas demonstrates larger ERF. Our FMSR effectively obtains the largest ERF, indicat

실험 결과

연구 질문

RQ1주파수 인지 구성요소를 갖춘 Mamba 기반 프레임워크가 대규모 RSI SR 작업에서 장거리 의존성을 효과적으로 모델링할 수 있는가?
RQ2주파수 도메인 신호 및 로컬 바이어스 모듈이 전적으로 전역 또는 로컬 모델보다 SR 재구성에 개선을 가져오는가?
RQ3선형 복잡도의 SSM 기반 접근 방식을 고해상도 RSI SR에 적용할 때 메모리와 계산의 트레이드오프는 무엇인가?
RQ4FMSR은 표준 RSI 벤치마크에서 최첨단 Transformer 기반 SR 방법과 어떻게 비교되는가?

주요 결과

FMSR은 평가된 RSI 벤치마크에서 평균 PSNR 기준으로 Transformer 기반의 최첨단 방법 HAT-L보다 0.11 dB 더 높은 성능을 보입니다.
FMSR은 HAT-L의 메모리의 28.05%와 계산 복잡도의 19.08%만 사용하여 상당한 효율성 향상을 나타냅니다.
ablation 연구에서 VSSM(전역 모델링), HGM(로컬 바이어스), FSM(주파수 선택)을 통합하면 성능이 향상되며, 특히 FSM과 HGM이 주목할 만한 개선을 제공합니다.
AID, DOTA, DIOR 데이터셋에서 FMSR은 경쟁력 있는 PSNR/SSIM/LPIPS 결과를 보여주며, FMSR++ 자체 임베딩 변형에서 추가 이득이 나타납니다.

Figure 2: Overview of the proposed FMSR. The Frequency-assisted Mamba Blocks (FMB) are arranged sequentially in Frequency-assisted Mamba Groups (FMG). In FMB, a Frequency Selection Module (FSM) is adopted to assist the learning process of the Vision State Space Module (VSSM) and Hybrid Gate Module (

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.