Skip to main content
QUICK REVIEW

[论文解读] Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Yi Xiao, Qiangqiang Yuan|arXiv (Cornell University)|May 8, 2024
Advanced Image Fusion Techniques被引用 7
一句话总结

本论文提出 FMSR,一种用于遥感图像超分辨率的频率辅助 Mamba 框架,结合 Vision State Space Modeling(Mamba)与频率感知模块,实现全局-局部双域建模,具有线性复杂度并提升 PSNR。

ABSTRACT

Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Considering that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively. Code will be available at https://github.com/XY-boy/FreMamba

研究动机与目标

  • Motivate efficient long-range dependency modeling for large-scale remote sensing images in SR tasks.
  • Leverage Mamba (Vision State Space Model) for global modeling with linear complexity.
  • Introduce frequency-aware components to capture high-frequency cues for better reconstruction.
  • Design a multi-level fusion architecture to integrate global and local representations via learnable adaptors.

提出的方法

  • Adopts a Frequency-assisted Mamba backbone consisting of Frequency-assisted Mamba Groups (FMGs).
  • Uses three parallel branches in each Frequency-assisted Mamba Block (FMB): a Vision State Space Module (VSSM) for global spatial modeling, a Frequency Selection Module (FSM) for frequency-domain cues, and a learnable scaling factor for adaptive fusion.
  • Incorporates a Hybrid Gate Module (HGM) to inject local inductive bias and a Frequency Selection Module to modulate high-frequency information.
  • Implements a learnable adaptor to rescale cross-level features and improve multi-level feature fusion.
  • Optimizes with L1 loss on AID dataset, retrains from patches, and evaluates with PSNR/SSIM/LPIPS on AID, DOTA, and DIOR benchmarks.
Figure 1: The Effective Receptive Field (ERF) [ 17 ] comparison for (a) CNN-based method NLSN [ 18 ] , (b) Transformer-based model RGT [ 19 ] , and the proposed Mamba-based network FMSR. A wider distribution of dark areas demonstrates larger ERF. Our FMSR effectively obtains the largest ERF, indicat
Figure 1: The Effective Receptive Field (ERF) [ 17 ] comparison for (a) CNN-based method NLSN [ 18 ] , (b) Transformer-based model RGT [ 19 ] , and the proposed Mamba-based network FMSR. A wider distribution of dark areas demonstrates larger ERF. Our FMSR effectively obtains the largest ERF, indicat

实验结果

研究问题

  • RQ1Can a Mamba-based framework with frequency-aware components effectively model long-range dependencies in large-scale RSI SR tasks?
  • RQ2Do frequency-domain cues and local bias modules improve SR reconstruction compared to purely global or local models?
  • RQ3What are the trade-offs in memory and computation when applying a linear-complexity SSM-based approach to high-resolution RSI SR?
  • RQ4How does FMSR perform against state-of-the-art Transformer-based SR methods on standard RSI benchmarks?

主要发现

  • FMSR outperforms state-of-the-art Transformer-based method HAT-L in PSNR by 0.11 dB on average across evaluated RSI benchmarks.
  • FMSR consumes 28.05% of HAT-L's memory and 19.08% of its computational complexity, indicating substantial efficiency gains.
  • Ablation studies show that integrating VSSM (global modeling), HGM (local bias), and FSM (frequency selection) contributes to performance gains, with FSM and HGM delivering notable improvements.
  • On AID, DOTA, and DIOR datasets, FMSR demonstrates competitive PSNR/SSIM/LPIPS results, with FMSR++ self-embedding variants achieving further gains.
Figure 2: Overview of the proposed FMSR. The Frequency-assisted Mamba Blocks (FMB) are arranged sequentially in Frequency-assisted Mamba Groups (FMG). In FMB, a Frequency Selection Module (FSM) is adopted to assist the learning process of the Vision State Space Module (VSSM) and Hybrid Gate Module (
Figure 2: Overview of the proposed FMSR. The Frequency-assisted Mamba Blocks (FMB) are arranged sequentially in Frequency-assisted Mamba Groups (FMG). In FMB, a Frequency Selection Module (FSM) is adopted to assist the learning process of the Vision State Space Module (VSSM) and Hybrid Gate Module (

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。