QUICK REVIEW

[論文レビュー] ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

Xinyang Pu, Hecheng Jia|arXiv (Cornell University)|Jan 4, 2024

Advanced Neural Network Applications被引用数 7

ひとこと要約

CWSAM は凍結された SAM をアダプターとともに用いて SAR の土地被覆セグメンテーションへ SAM を適応させ、クラスごとのマスクデコーダーと低周波 SAR 入力モジュールを備え、訓練可能なパラメータを少なくしつつ最先端の成果を達成します。

ABSTRACT

In the realm of artificial intelligence, the emergence of foundation models, backed by high computing capabilities and extensive data, has been revolutionary. Segment Anything Model (SAM), built on the Vision Transformer (ViT) model with millions of parameters and vast training dataset SA-1B, excels in various segmentation scenarios relying on its significance of semantic information and generalization ability. Such achievement of visual foundation model stimulates continuous researches on specific downstream tasks in computer vision. The ClassWise-SAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images. The proposed CWSAM freezes most of SAM's parameters and incorporates lightweight adapters for parameter efficient fine-tuning, and a classwise mask decoder is designed to achieve semantic segmentation task. This adapt-tuning method allows for efficient landcover classification of SAR images, balancing the accuracy with computational demand. In addition, the task specific input module injects low frequency information of SAR images by MLP-based layers to improve the model performance. Compared to conventional state-of-the-art semantic segmentation algorithms by extensive experiments, CWSAM showcases enhanced performance with fewer computing resources, highlighting the potential of leveraging foundational models like SAM for specific downstream tasks in the SAR domain. The source code is available at: https://github.com/xypu98/CWSAM.

研究の動機と目的

SAR 画像における土地被覆セグメンテーションのために SAM を SAR ドメインへ橋渡しする。
パラメータ効率の高い微調整フレームワークで意味的セグメンテーションを達成する。
マルチクラスのピクセルラベリングを可能にするクラスワイズマスクデコーダを設計する。
低周波 SAR 情報を注入するタスク固有の入力モジュールを組み込む。
FUSAR-Map1.0 および FUSAR-Map2.0 データセットでの効率性と精度の利点を実証する。

提案手法

SAM のVision Transformerエンコーダを凍結し、各トランスフォーマーブロックに軽量なアダプターを挿入してパラメータ効率的な微調整を可能にする。
通常は二値の SAM マスクから専用のクラス別予測経路を用いてマルチクラスマスクを出力するクラスワイズマスクデコーダを導入する。
2D FFT由来の特徴を SAM 埋め込みと MLP ベースの融合によって低周波 SAR 情報を注入するタスク固有の入力モジュールを接続する。
不均衡な土地被覆カテゴリーに対処するためWeighted Cross Entropy損失で訓練する。
プロンプトエンコーディングとマスクデコードのための SAM アーキテクチャを維持しつつ、追加パラメータのごく一部のみを学習する。

実験結果

リサーチクエスチョン

RQ1ClassWise-SAM-Adapter は SAM の完全なファインチューニングと比べて、はるかに少ない訓練可能パラメータで競争力のある SAR 土地被覆セグメンテーションを達成できるか？
RQ2クラスワイズマスクデコーダは、SAM の元のマスク出力と比較して SAR 画像において有意義なマルチクラスセグメンテーションを提供するか？
RQ3低周波 SAR 情報を組み込むことがセグメンテーション性能に与える影響は何か？
RQ4FUSAR-Map1.0 および FUSAR-Map2.0 における最先端の意味的セグメンテーション手法と比較して、mIoU および他の指標の観点で CWSAM はどうか？

主な発見

CWSAM は FUSAR-Map1.0 で mIoU 61.48 を達成し、複数の指標でいくつかの最先端手法（例：SegFormer 系）を上回る。
FUSAR-Map1.0 で CWSAM は OA 82.14 および Accuracy 73.45 を示し、堅牢な総合性能を示す。
FUSAR-Map2.0 で CWSAM は mIoU 36.03 および OA 67.67 を達成し、比較対象手法の中で最良の総合性能を実現している。
このアプローチはカテゴリ別の性能とエッジの識別力が高く、道路や建物などの難しいクラスで顕著な改善を示す。
CWSAM は凍結された SAM パラメータを用いた軽量アダプター構成を使用し、訓練において訓練可能パラメータを減らし、メモリ使用量を抑えつつ効率的な学習を可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。