QUICK REVIEW

[论文解读] TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Zhengpeng Feng, Clement Atzberger|arXiv (Cornell University)|Jun 25, 2025

Image Retrieval and Classification Techniques被引用 30

一句话总结

TESSERA 使用自监督学习，从全球分辨率为10米的 Sentinel-1 与 Sentinel-2 时序中学习 128 维逐像素表示，能够在多样化的下游遥感任务上取得良好表现，且提供全局预计算的地图。

ABSTRACT

Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. The model training/inference code, downstream task code, and pre-generated embeddings can be accessed at https://github.com/ucam-eo

研究动机与目标

在数据缺口和标注稀缺的情况下，强调高分辨率、时序丰富的表征在地球观测中的必要性。
提出一个自监督、双编码器的基础模型，用以融合光学与 SAR 时序数据。
生成全球范围内 10m 的年度表征（2017–2024），并通过固定嵌入实现下游任务。
在作物类别分类、冠层高度估计、烧毁区域检测、生物量估计以及碳市场指数等任务上展示最新的性能水平。
提供开源访问和“模型即数据”的方式，以降低从业者的门槛。

提出的方法

将每个 10m 像素的未标记 Sentinel-1 SAR 与 Sentinel-2 MSI 时序处理为模态特定的 d-像素（时间步 × 通道）。
使用两个并行的 Transformer 编码器（一个用于 SAR VV/VH，另一个用于 MSI 光谱），具有基于 DOY 的时序编码和一个注意力汇聚层，以生成每个模态的 128 维表示。
用一个多层感知机将模态嵌入融合，形成每个像素的 128 维融合表示。
通过一个大投影网络将融合表示扩展到 16,384 维。
使用修订后的 Barlow Twins 损失（L_BT + L_MIX）对投影特征的互相关进行训练，使用通过对年观测进行稀疏时序抽样得到的两个增强视图。
在推理阶段，冻结编码器以生成 2017–2024 的年度 10m 表征，并输出全局表示图。

实验结果

研究问题

RQ1来自 Sentinel-1 与 Sentinel-2 的自监督多模态时序嵌入是否能在多样化的 EO 任务中超越传统特征工程和现有的基础模型？
RQ2全球 10m 年度表征是否能推广到作物分类、冠层高度、烧毁区域和生物量估计，尤其在低标注场景下？
RQ3在不进行显式预处理的情况下，学习到的表征在捕捉时序动态和干扰（如火灾）方面的效果如何？
RQ4开源的、预计算的表示地图方法是否有助于在 EO 研究中促进更广泛的采用与可重复性？

主要发现

TESSERA 表征在下游任务中相对于传统基线和其他基础模型实现了最先进的性能。
在奥地利 INVEKOS 数据集的作物类型分类中，使用简单的 MLP 的 TESSERA 在各种数据情形下（包括一波学习）都优于随机森林和 PRESTO 表征。
热带 Danum Valley 的冠层高度估计显示 TESSERA 达到 R^2 = 0.66，RMSE = 8.88 m，偏差 = -0.62 m，优于全球和区域 CHM 产品。
烧毁区域分析表明 TESSERA 嵌入在 UMAP 投影中能够将烧毁区与未烧毁区分开，并区分火灾时序和严重程度。
在多项任务中，TESSERA 在标注数据有限的情况下依然具有鲁棒性，常常优于或匹配定制模型。
该模型支持“模型即数据”范式，提供预计算的 10m 表征，降低最终用户的前处理需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。