QUICK REVIEW

[论文解读] RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

Xianping Ma, Xiaokang Zhang|arXiv (Cornell University)|Apr 3, 2024

Advanced Image and Video Retrieval Techniques被引用 5

一句话总结

RS3Mamba 引入双分支架构，将视觉状态空间 (VSS) 辅助编码器与 ResNet 主编码器通过协作完成模块进行融合，以实现线性复杂度的遥感语义分割提升。

ABSTRACT

Semantic segmentation of remote sensing images is a fundamental task in geoscience research. However, there are some significant shortcomings for the widely used convolutional neural networks (CNNs) and Transformers. The former is limited by its insufficient long-range modeling capabilities, while the latter is hampered by its computational complexity. Recently, a novel visual state space (VSS) model represented by Mamba has emerged, capable of modeling long-range relationships with linear computability. In this work, we propose a novel dual-branch network named remote sensing images semantic segmentation Mamba (RS3Mamba) to incorporate this innovative technology into remote sensing tasks. Specifically, RS3Mamba utilizes VSS blocks to construct an auxiliary branch, providing additional global information to convolution-based main branch. Moreover, considering the distinct characteristics of the two branches, we introduce a collaborative completion module (CCM) to enhance and fuse features from the dual-encoder. Experimental results on two widely used datasets, ISPRS Vaihingen and LoveDA Urban, demonstrate the effectiveness and potential of the proposed RS3Mamba. To the best of our knowledge, this is the first vision Mamba specifically designed for remote sensing images semantic segmentation. The source code will be made available at https://github.com/sstary/SSRS.

研究动机与目标

提升遥感图像的语义分割，解决 CNN 的局部感受野与 Transformer 的高计算问题。
引入具有 VSS 基础的辅助编码器以提供全局上下文。
开发协作完成模块 (CCM) 有效融合跨分支特征。
在 ISPRS Vaihingen 与 LoveDA Urban 数据集上进行对比分析以证明效果。
提供将基于 Mamba 的组件应用于遥感任务的实用性与复杂性见解。

提出的方法

使用 SS2D 与 S6 的辅助 VSS 基于编码器来捕捉线性复杂度的长程依赖。
主编码器采用 ResNet18 以强大局部特征提取。
协作完成模块 (CCM) 通过全局分支（自注意力）和局部分支（卷积）融合跨分支特征。
解码器遵循 UNetformer 风格的跳连以恢复像素级预测。
训练目标是在语义类别上的交叉熵损失。

实验结果

研究问题

RQ1双分支架构是否能利用视觉状态空间 (VSS) 块来改进遥感图像的语义分割，超越仅 CNN 或仅 Transformer 的模型？
RQ2协作完成模块是否能有效将全局的 VSS 派生特征与局部的 CNN 特征融合，以提升分割质量？
RQ3与最先进方法相比，在标准遥感数据集（ISPRS Vaihingen 和 LoveDA Urban）上的性能提升如何？
RQ4相对于 Transformer 与 CNN 的对手，RS3Mamba 的计算权衡（FLOPs、参数、内存）是多少？
RQ5这是首个为遥感语义分割量身定制的 vision Mamba 模型吗，且其源代码是否公开可获取？

主要发现

Method	Backbone	impervious surface (F1/IoU)	building (F1/IoU)	low vegetation (F1/IoU)	tree (F1/IoU)	car (F1/IoU)	mF1	mIoU
ABCNet	ResNet-18	89.78/81.45	94.30/89.21	78.49/64.59	90.08/81.95	74.05/58.80	85.34	75.20
TransUNet	R50-ViT-B	90.77/83.10	94.32/89.25	79.02/65.32	90.53/82.70	82.66/70.45	87.46	78.16
UNetformer	ResNet-18	92.33/85.76	96.25/92.78	80.47/67.33	90.85/83.22	89.35/80.75	89.85	81.97
CMTFNet	ResNet-50	92.53/86.09	96.95 / 94.09	79.98/66.64	90.22/82.19	89.87/81.60	89.91	82.12
RS3Mamba	R18-Mamba-T	92.83 / 86.62	96.82/93.83	80.84 / 67.84	91.10 / 83.66	90.09 / 81.97	90.34	82.78

RS3Mamba 在 ISPRS Vaihingen 上达到 mF1 90.34 和 mIoU 82.78，优于 UNetformer 基线。
在 Vaihingen 上，RS3Mamba 相对于基线在不透水表面 IoU 提升 0.53 个百分点，低植被 IoU 提升 0.51 个百分点。
在 LoveDA Urban 数据集上，RS3Mamba 获得 mF1 66.86 和 mIoU 50.93，与基线相比在农业（IoU 提升 8.33%）等类别有显著增益。
消融研究表明，双分支设计搭配 CCM 相比单分支或单纯融合具有最佳性能（mF1 90.34，mIoU 82.78）。
与基于 Transformer 的 TransUNet 相比，RS3Mamba 在 FLOPs 与参数更低的情况下提供具竞争力的性能（FLOPs 31.65G vs 64.55G 的 TransUNet）。
消融验证 CCM 对跨分支融合有效，优于简单相加或无 CCM 的做法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。