QUICK REVIEW

[论文解读] RS-Mamba for Large Remote Sensing Image Dense Prediction

Sijie Zhao, Hao Chen|arXiv (Cornell University)|Apr 3, 2024

Remote Sensing and Land Use被引用 5

一句话总结

RS-Mamba 引入全向状态空间模型（OSS M）以实现对大尺寸高分辨率遥感图像的全局上下文建模，线性复杂度下达到语义分割与变化检测的最先进结果，无需基于块的裁剪。

ABSTRACT

Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional practice of cropping large images into smaller patches results in a notable loss of contextual information. To address these issues, we propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images. RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images. Considering that the land covers in remote sensing images are distributed in arbitrary spatial directions due to characteristics of remote sensing over-head imaging, the RSM incorporates an omnidirectional selective scan module to globally model the context of images in multiple directions, capturing large spatial features from various directions. Extensive experiments on semantic segmentation and change detection tasks across various land covers demonstrate the effectiveness of the proposed RSM. We designed simple yet effective models based on RSM, achieving state-of-the-art performance on dense prediction tasks in VHR remote sensing images without fancy training strategies. Leveraging the linear complexity and global modeling capabilities, RSM achieves better efficiency and accuracy than transformer-based models on large remote sensing images. Interestingly, we also demonstrated that our model generally performs better with a larger image size on dense prediction tasks. Our code is available at https://github.com/walking-shadow/Official_Remote_Sensing_Mamba.

研究动机与目标

动机并解决在极高分辨率遥感图像中无需基于块裁剪的情况下建模全局上下文的挑战。
引入基于状态空间模型的遥感 Mamba（RSM），实现线性复杂度。
提出全向选择性扫描模块（OSSM），以捕捉多方向的大尺度特征。
在简单训练策略下，在语义分割和变化检测数据集上展示最先进的性能。

提出的方法

采用带有选择性扫描机制的状态空间模型（SSM），以线性复杂度建模长距离相关性。
设计用于语义分割的遥感 Mamba（RSM-SS），使用带 OSS 块的类 U-Net 的编码器-解码器结构。
设计用于变化检测的遥感 Mamba（RSM-CD），采用带共享权重的 Siamese FC-Siam-Conc 主干网络与 OSS 块。
提出全向选择性扫描模块（OSSM），在八个方向（水平、垂直、对角线、反对角线及其反向）进行扫描以进行全局上下文建模。
将图像块嵌入序列，应用基于 OSSM 的特征提取，并通过跳跃连接和卷积进行融合，生成密集预测。

实验结果

研究问题

RQ1基于 SSM 的架构在不进行块裁剪的情况下，是否能以线性复杂度有效建模大尺寸 VHR 遥感图像的全局上下文？
RQ2全向选择性扫描模块是否比单向/双向扫描更好地捕捉 VHR 图像中的多方向大尺度特征？
RQ3简单的基于 RSM 的模型是否能在遥感数据集的语义分割和变化检测任务中超过最先进方法？
RQ4与基于块的 transformer/新型 CNN-transformer 混合方法相比，RSM 在无块处理中的表现如何？

主要发现

RSM-SS 在 Massachusetts Road 语义分割任务上达到最先进的 IoU 和 F1（IoU 0.6735；F1 0.8049）。
消融结果显示，八方向选择性扫描的 OSSM 在语义分割（Massachusetts Road）和变化检测（WHU-CD）中均优于 SS1D 和 SS2D。
在 WHU-CD 的变化检测中，OSSM 获得 IoU 84.96，F1 91.87，Precision 93.37，Recall 90.42。
RSM-SS 和 RSM-CD 以简单的架构和无花哨的训练技巧展示出强劲性能。
全向性 SSM 基础的方法使直接处理大尺寸 VHR 图像成为可能，避免了基于块的上下文损失。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。