QUICK REVIEW

[論文レビュー] SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

Di Wang, Jing Zhang|arXiv (Cornell University)|May 3, 2023

Remote-Sensing Image Classification被引用数 48

ひとこと要約

本論文は Segment Anything Model を活用し、既存の RS 物体検出データセットを用いて大規模なリモートセンシング分割データセット（SAMRS）を構築し、RSタスクの分割前処理学習の利点を示します。

ABSTRACT

The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS totally possesses 105,090 images and 1,668,241 instances, surpassing existing high-resolution RS segmentation datasets in size by several orders of magnitude. It provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. Moreover, preliminary experiments highlight the importance of conducting segmentation pre-training with SAMRS to address task discrepancies and alleviate the limitations posed by limited training data during fine-tuning. The code and dataset will be available at https://github.com/ViTAE-Transformer/SAMRS.

研究の動機と目的

SAMと既存の検出データセットを活用してリモートセンシングにおける画素レベルの注釈を効率化する。
semantic、instance、bounding box情報を含む大規模で多様なRS分割データセット（SAMRS）を作成する。
SAMベースのプロンプトと事前学習（SEP）がRS分割性能に及ぼす影響を分析する。
複数のRS分割アーキテクチャとバックボーンタイプに対してSEPを評価する。
訓練データが乏しい場合に分割前処理学習の実用的な利点を示す。

提案手法

Segment Anything Model (SAM) を既存のRS物体検出データセットと併用してピクセルレベルの分割マスクを生成する。
検出アノテーションを6つのプロンプト変種（CP、H-Box、RH-Box、そしてそれらのマスク対応）に変換してマスクプロンプトにする。
SAMベースの注釈に適したサイズにデータセット（DOTA-V2.0、FAIR1M-2.0、DIOR）を切り抜き/リサイズする。
異なるカテゴリ数を持つ複数データセットを扱うためのマルチヘッド事前学習アプローチを推進する。
SAM-generated masks をさまざまなバックボーンモデルと訓練 regime と組み合わせて分割前処理学習（SEP）を行う。
プロンプトタイプを比較するアブレーションを実施し、アーキテクチャ間のSEP効果を評価する。

実験結果

リサーチクエスチョン

RQ1SAM由来のプロンプトは既存の検出アノテーションを変換する際に高品質なピクセルレベルのRSセグメンテーションを生み出せるのか。
RQ2異なるプロンプトタイプがSAMベースのRS分割精度に与える影響は何か。
RQ3SAMRS上の分割前処理学習は、特にラベル付きデータが限られている場合に下流のRS分割性能を改善するのか。
RQ4SEPはRSタスクにおけるさまざまなバックボーンと分割アーキテクチャとどのように相互作用するのか。
RQ5SAMRSは既存のRS分割データセットと比較して、スケーラビリティと多様性の利点を持つのか。

主な発見

SAMRSは105,090枚の画像と1,668,241件のインスタンスを含み、従来のRS分割データセットを桁違いに上回る。
プロンプトの中で水平ボックスプロンプト（H-Box）は一般に最良の分割性能を示し、回転だけが利用可能な場合にはRH-Boxも有効である。
SAMRS上の分割前処理学習（SEP）は、複数のバックボーンとアーキテクチャに対して下流のRS分割性能を改善し、特に train データが乏しい場合に顕著である。
SEPとSAMRSを組み合わせると、ImageNet や MAE などの従来の事前学習戦略をいくつかの設定で上回ることがある。
Mask2Former のようなエンドツーエンドモデルはSEPで混合的な利益を示す場合があり、RSデータに対するモデル特有の最適化が必要であることを示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。