QUICK REVIEW

[論文レビュー] RS-Mamba for Large Remote Sensing Image Dense Prediction

Sijie Zhao, Hao Chen|arXiv (Cornell University)|Apr 3, 2024

Remote Sensing and Land Use被引用数 5

ひとこと要約

RS-Mamba は Omnidirectional State Space Models (OSS M) を導入し、グローバルコンテキストを線形計算量でモデル化可能にする全方位状態空間モデルを提案し、パッチベースの切り抜きを行わずにセマンティックセグメンテーションと変化検出で最先端の結果を達成します。

ABSTRACT

Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional practice of cropping large images into smaller patches results in a notable loss of contextual information. To address these issues, we propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images. RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images. Considering that the land covers in remote sensing images are distributed in arbitrary spatial directions due to characteristics of remote sensing over-head imaging, the RSM incorporates an omnidirectional selective scan module to globally model the context of images in multiple directions, capturing large spatial features from various directions. Extensive experiments on semantic segmentation and change detection tasks across various land covers demonstrate the effectiveness of the proposed RSM. We designed simple yet effective models based on RSM, achieving state-of-the-art performance on dense prediction tasks in VHR remote sensing images without fancy training strategies. Leveraging the linear complexity and global modeling capabilities, RSM achieves better efficiency and accuracy than transformer-based models on large remote sensing images. Interestingly, we also demonstrated that our model generally performs better with a larger image size on dense prediction tasks. Our code is available at https://github.com/walking-shadow/Official_Remote_Sensing_Mamba.

研究の動機と目的

パッチベースの切り抜きを行わずに、超高分解能リモートセンシング画像におけるグローバルコンテキストのモデリングという課題を動機づけ、対処する。
線形計算量を持つ State Space Model ベースの Remote Sensing Mamba（RSM）を導入する。
多方向の大規模特徴を捉える Omnidirectional Selective Scan Module（OSSM）を提案する。
簡単なトレーニング戦略で semantic segmentation および change detection のデータセットにおいて最先端の性能を示す。

提案手法

線形計算量で長距離依存性をモデリングするために、選択的スキャン機構を備えた State Space Models (SSM) を採用する。
OSS ブロックを用いた U-Net 風のエンコーダ-デコーダで、Semantic Segmentation 用の Remote Sensing Mamba (RSM-SS) を設計する。
共有重みと OSS ブロックを備えた Siamese FC-Siam-Conc バックボーンを用いて Change Detection 用の RSM-CD を設計する。
グローバルコンテキストモデリングのため、八方向（水平、垂直、対角線、反対角線およびそれらの反転）にスキャンする Omnidirectional Selective Scan Module (OSSM) を導入する。
画像パッチをシーケンスに埋め込み、OSSM ベースの特徴抽出を適用し、スキップ接続と畳み込みを介して密な予測を生成する。

実験結果

リサーチクエスチョン

RQ1SSM ベースのアーキテクチャはパッチ処理なしで大規模な VHR リモートセンシング画像のグローバルコンテキストを効果的にモデル化できるか。
RQ2Omnidirectional Selective Scan Module は VHR 画像の多方向の大規模特徴を一方向/二方向スキャンよりも優れて捉えられるか。
RQ3単純な RSM ベースのモデルはリモートセンシングデータセットのセマンティックセグメンテーションと変化検出で最先端手法を超えられるか。
RQ4パッチなし処理はパッチベースのトランスフォーマー/新しい CNN-トランスフォーマーハイブリッドと比較してどうか。

主な発見

RSM-SS は Massachusetts Road セマンティックセグメンテーションタスクで最先端の IoU と F1 を達成（IoU 0.6735; F1 0.8049）。
アブレーションにより、8方向の選択的スキャンを用いた OSSM が SS1D および SS2D を、セマンティックセグメンテーション（Massachusetts Road）と変更検出（WHU-CD）の双方で上回る。
WHU-CD の変更検出では OSSM が IoU 84.96、F1 91.87、Precision 93.37、Recall 90.42 を達成。
RSM-SS および RSM-CD は、シンプルなアーキテクチャと特別なトレーニング手法を用いても高い性能を示す。
Omnidirectional SSM ベースのアプローチにより、大規模な VHR 画像をパッチ処理なしで直接処理可能になり、パッチベースの文脈喪失を回避できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。