QUICK REVIEW

[論文レビュー] Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

Daniela Szwarcman, Sujit Roy|arXiv (Cornell University)|Dec 3, 2024

Advanced Computational Techniques and Applications被引用数 16

ひとこと要約

Prithvi-EO-2.0 は、4つの時期を跨ぐ地理空間基盤モデル（300M/600M パラメータ）で、時系列および位置埋め込みを用い、4.2M の HLS-Sentinel-2 サンプルで学習し、GEO-Bench およびダウンストリームタスクで高い性能を発揮する。

ABSTRACT

This paper presents Prithvi-EO-2.0, a new geospatial foundation model that offers significant improvements over its predecessor, Prithvi-EO-1.0. Trained on 4.2 million global time series samples from NASA's Harmonized Landsat and Sentinel-2 data archive at 30-m resolution, the new model incorporates temporal and location embeddings for enhanced performance across various geospatial tasks. Through extensive benchmarking with GEO-Bench, the model outperforms the previous Prithvi-EO model by 8% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1 m to 15 m). The results demonstrate the versatility of the model in both classical Earth observation and high-resolution applications. Early involvement of end-users and subject matter experts (SMEs) allowed constant feedback on model and dataset design, enabling customization across diverse SME-led applications in disaster response, land cover and crop mapping, and ecosystem dynamics monitoring. Prithvi-EO-2.0 is available as an open-source model on Hugging Face and IBM TerraTorch, with additional resources on GitHub. The project exemplifies the Trusted Open Science approach embraced by all involved organizations.

研究の動機と目的

Prithvi-EO-2.0 を多時相の地理空間基盤モデルとして導入し、1.0 の前身を拡張する。
季節性と長期的傾向を捉えるために、30m 解像度のHLS-Sentinel-2の大規模で多様な事前学習データセットを活用する。
GEO-Bench のベンチマークと、災害応答・土地利用・生態系モニタリングを横断する SME 主導のダウンストリームアプリケーションを評価する。
Hugging Face、Terratorch、GitHub を通じて、簡易なファインチューニングとオープンサイエンス協働のツールを提供する。

提案手法

3D パッチ埋め込みと3D位置エンコーディングを用い、MAEフレームワークで時空間入力を処理する。
地理位置情報と日付メタデータを、欠損時のドロップ機構を備えたトークン埋め込みに追加された学習可能なバイアス項として組み込む。
HLS データセットの 4.2M の学習サンプルと 46k の検証サンプルを用い、ViT バックボーンで 300M および 600M の2つのモデルサイズを事前学習。
JUWELS Booster ハードウェア上で分散データ並列で400エポック訓練；ハイパーパラメータを調整し、固定の224x224入力サイズで評価する。
分類・分割・回帰タスクのダウンストリーム微調整を促進するため、Hugging Face および IBM terratorch を介してモデルを公開する。

Figure 1 : LULC distribution of the training samples in comparison to all land tiles.

実験結果

リサーチクエスチョン

RQ1多時相・30m解像度のGFM は、従来の Prithvi モデルおよび他のGFMと比べて多様なEOタスクでどのように性能を発揮するか？
RQ2明示的な時刻・位置埋め込みの影響は、分類と分割のベンチマークでどの程度か？
RQ3SME主導の評価とエンドユーザー向けツールは、地理空間基盤モデルの普及と実世界での適用性を改善できるか？
RQ4Prithvi-EO-2.0 を用いた下流のEOタスクで、どの程度のデータ効率と一般化能力が達成可能か？

主な発見

時系列と位置埋め込みを持つ600M版は、GEO-Bench のデータセット全体で最高性能を達成し、他の6つの地理空間基盤モデルを上回る。
Prithvi-EO-2.0-600M-TL および -600M は、12 の GEO-Bench データセット（分類と分割）で最良の総合性能を達成; 全体的な結果は Prithvi-EO-1.0 および 100M バリアントより改善を示す。
Prithvi-EO-1.0-100M と比較して、同じアーキテクチャをより大規模なグローバルデータセットで事前学習すると、約3%の GEO-Bench 改善をもたらし、特定のタスクでより大きなゲインを示す。
4.2M 学習サンプルと 46k 検証サンプルは、30m 解像度での堅牢な事前学習を支え、より高解像度タスク（0.1m–15m）および非 S2 ドメインへの一般化を可能にする。
ダウンストリームの SME 主導タスクは、災害対応、土地利用、作物マッピング、エコシステムダイナミクスに実用性のある適用性を示し、ファインチューニングのための TerraTorch 経由のオンランプを備える。

Figure 2 : Global HLS dataset distribution visualized on a tile-level. The number of training samples are color-coded in orange to green while validation tiles are visualized in blue.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。