QUICK REVIEW

[論文レビュー] Vision Foundation Models in Remote Sensing: A Survey

Siqi Lu, Junlin Guo|arXiv (Cornell University)|Aug 6, 2024

Advanced Computational Techniques and Applications被引用数 7

ひとこと要約

リモートセンシング（2021–2024）におけるビジョンファウンデーションモデルの総合的な調査で、アーキテクチャ、事前学習データ／手法、データセット、コンピュータビジョンおよび領域特有のタスク全体の性能傾向を詳述します。

ABSTRACT

Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.

研究の動機と目的

2021年6月から2024年6月までにリリースされたファウンデーションモデルについて、構造化された概要を提供する。リモートセンシング。
コンピュータビジョンタスクとドメイン特化タスク別にモデルを分類し、アーキテクチャ、データセット、事前学習手法を要約する。
リモートセンシングにおけるファウンデーションモデルの性能動向、課題、今後の方向性を示す。

提案手法

ダウンストリームタスク（シーン分類、セマンティックセグメンテーション、検出、変更検出）ごとにファウンデーションモデルをレビューおよび分類する。
アーキテクチャ（ResNet、ViT、Swin など）、事前学習データセット、および自己教師あり学習手法（対照学習、MAE、DINO）を要約する。
事前学習戦略とそれがRSタスクにおける頑健性と一般化に与える影響を比較する。
よく用いられるRSデータセットとデータモダリティ（多スペクトル、SAR、ハイパースペクトル、時系列データ）を議論する。
大規模事前学習RSモデルのギャップ、課題、今後の方向性を特定する。

実験結果

リサーチクエスチョン

RQ1リモートセンシングにおけるビジョンファウンデーションモデルの現況と動向は何か（2021–2024）？
RQ2事前学習手法とバックボーンは、シーン分類、セマンティックセグメンテーション、物体検出、変更検出などのRSタスクの性能にどう影響するか？
RQ3これらのファウンデーションモデルを支えるデータセットとデータモダリティは何か、一般化と展開の課題は何か？

主な発見

自己教師あり学習（例：対照学習、MAE、DINO）で事前学習されたファウンデーションモデルは、RSタスクの性能と頑健性を高める。
TransformersおよびViTベースのバックボーンは、RSファウンデーションモデルにおいてCNN（ResNet）と並んで顕在化してきた。
多くのRSデータセット（例：BigEarthNet、SEN12MS、fMoW、MillionAID）は事前学習と評価をサポートし、地理的・モダリティの幅広いカバレッジを可能にしている。
モデルは複数のダウンストリームタスク（シーン分類、セマンティックセグメンテーション、物体検出、変更検出）で評価されることが増え、特定タスクで最先端のベンチマークを達成する研究が多数ある。
課題には、高品質で多様なデータの必要性、膨大な計算資源、RS固有のドメイン適応が含まれる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。