QUICK REVIEW

[论文解读] Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Yi Xiao, Qiangqiang Yuan|arXiv (Cornell University)|May 8, 2024

Advanced Image Fusion Techniques被引用 7

一句话总结

本论文提出 FMSR，一种用于遥感图像超分辨率的频率辅助 Mamba 框架，结合 Vision State Space Modeling（Mamba）与频率感知模块，实现全局-局部双域建模，具有线性复杂度并提升 PSNR。

ABSTRACT

Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Considering that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively. Code will be available at https://github.com/XY-boy/FreMamba

研究动机与目标

Motivate efficient long-range dependency modeling for large-scale remote sensing images in SR tasks.
Leverage Mamba (Vision State Space Model) for global modeling with linear complexity.
Introduce frequency-aware components to capture high-frequency cues for better reconstruction.
Design a multi-level fusion architecture to integrate global and local representations via learnable adaptors.

提出的方法

Adopts a Frequency-assisted Mamba backbone consisting of Frequency-assisted Mamba Groups (FMGs).
Uses three parallel branches in each Frequency-assisted Mamba Block (FMB): a Vision State Space Module (VSSM) for global spatial modeling, a Frequency Selection Module (FSM) for frequency-domain cues, and a learnable scaling factor for adaptive fusion.
Incorporates a Hybrid Gate Module (HGM) to inject local inductive bias and a Frequency Selection Module to modulate high-frequency information.
Implements a learnable adaptor to rescale cross-level features and improve multi-level feature fusion.
Optimizes with L1 loss on AID dataset, retrains from patches, and evaluates with PSNR/SSIM/LPIPS on AID, DOTA, and DIOR benchmarks.

Figure 1: The Effective Receptive Field (ERF) [ 17 ] comparison for (a) CNN-based method NLSN [ 18 ] , (b) Transformer-based model RGT [ 19 ] , and the proposed Mamba-based network FMSR. A wider distribution of dark areas demonstrates larger ERF. Our FMSR effectively obtains the largest ERF, indicat

实验结果

研究问题

RQ1Can a Mamba-based framework with frequency-aware components effectively model long-range dependencies in large-scale RSI SR tasks?
RQ2Do frequency-domain cues and local bias modules improve SR reconstruction compared to purely global or local models?
RQ3What are the trade-offs in memory and computation when applying a linear-complexity SSM-based approach to high-resolution RSI SR?
RQ4How does FMSR perform against state-of-the-art Transformer-based SR methods on standard RSI benchmarks?

主要发现

FMSR outperforms state-of-the-art Transformer-based method HAT-L in PSNR by 0.11 dB on average across evaluated RSI benchmarks.
FMSR consumes 28.05% of HAT-L's memory and 19.08% of its computational complexity, indicating substantial efficiency gains.
Ablation studies show that integrating VSSM (global modeling), HGM (local bias), and FSM (frequency selection) contributes to performance gains, with FSM and HGM delivering notable improvements.
On AID, DOTA, and DIOR datasets, FMSR demonstrates competitive PSNR/SSIM/LPIPS results, with FMSR++ self-embedding variants achieving further gains.

Figure 2: Overview of the proposed FMSR. The Frequency-assisted Mamba Blocks (FMB) are arranged sequentially in Frequency-assisted Mamba Groups (FMG). In FMB, a Frequency Selection Module (FSM) is adopted to assist the learning process of the Vision State Space Module (VSSM) and Hybrid Gate Module (

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。