[论文解读] BigEarthNet v2
reBEN 是一个经过改进的多模态遥感数据集,基于 Sentinel-1/2 构建,具有像素级和场景级标签、经过改进的大气校正,以及基于地理信息的数据划分以降低空间泄漏。
BigEarthNet v2.0 The BigEarthNet v2.0 dataset was constructed by the Remote Sensing Image Analysis (RSiM) Group and the Database Systems and Information Management (DIMA) Group at the Technische Universität Berlin (TU Berlin). This work is supported by the European Research Council under the ERC Starting Grant BigEarth and by the Berlin Institute for the Foundations of Learning and Data (BIFOLD). BigEarthNet v2.0 is a benchmark dataset consisting of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct BigEarthNet v2.0 with Sentinel-2 image patches (called as BigEarthNet-S2), 115 Sentinel-2 tiles acquired between June 2017 and May 2018 over 10 countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, and Switzerland) of Europe were initially selected. All the tiles were atmospherically corrected by the Sentinel-2 Level 2A product generation and formatting tool (sen2cor v2.11). Then, they were divided into 549,488 image patches. Each image patch was associated with a pixel-level reference map and multiple land-cover class labels (i.e., multi-labels) that were derived from the most recent CORINE Land Cover database of the year 2018 (CLC2018 v2020_u1). To construct BigEarthNet v2.0 with Sentinel-1 image patches (called as BigEarthNet-S1), 312 Sentinel-1 scenes acquired between June 2017 and May 2018 that jointly cover the area of all original 115 Sentinel-2 tiles with close temporal proximity were selected and processed. BigEarthNet-S1 consists of 549,488 preprocessed Sentinel-1 image patches – one for each Sentinel-2 patch. The BigEarthNet v2.0 dataset includes several significant improvements compared to the previous 1.0 version. These changes include the application of the latest atmospheric correction tool (sen2cor), which results in higher-quality patches. Additionally, the most recent version of the CLC2018 database was utilized to extract label information, overcoming label noise present in BigEarthNet v1.0. Apart from providing patch-level labels, v2.0 additionally includes pixel-level reference maps, making the dataset suitable for pixel- and scene-based learning tasks. Furthermore, BigEarthNet v2.0 introduces a new geographical-based split assignment algorithm, which significantly reduces spatial correlation among the train, validation, and test sets compared to v1.0. If you use this work, please cite: K. Clasen, L. Hackel, T. Burgert, G. Sumbul, B. Demir, V. Markl, “reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis”, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2025.
研究动机与目标
- 通过引入基于最新 CORINE 土地覆盖图(CLC2018)的改进标签,解决 BigEarthNet 的标签噪声和大气校正更新问题。
- 提供像素级参考图以支持像素级和场景级学习任务。
- 通过地理分割分配算法降低训练/验证/测试划分之间的空间相关性。
- 提供高效深度学习训练的软件工具(rico-hdl)并提供用于最先进架构的预训练模型。
提出的方法
- 使用最新的 sen2cor(v2.11)从125个 Sentinel-2 L1C 图块创建 1200m x 1200m 的补丁,以获取 Sentinel-2 level-2A 数据。
- 用最新的 CLC2018 地图将补丁叠加,推导像素级参考图和 19 类 LULC 标签。
- 引入基于地理的划分分配算法,以在降低空间相关性的情况下分离 train/validation/test。
- 提供用于 DL 优化的数据格式转换器(rico-hdl),以 LMDB 存储数据并使用 safetensors 以便快速加载。
- 发布预训练模型权重和训练脚本,以实现 S1 和 S2 数据的多模态遥感学习。

实验结果
研究问题
- RQ1相比 BigEarthNet 基线,使用最新大气校正(sen2cor 2.11)在标签质量和 DL 性能上有多大提升?
- RQ2基于 CORINE 的像素级标签能否降低标签噪声并提升像素级与场景级学习任务?
- RQ3基于地理的划分是否通过降低 train/validation/test 之间的空间泄漏来提升 DL 模型评估的可靠性?
- RQ4多模态(S1 与 S2)数据融合对多标签遥感图像分类性能有何影响?
- RQ5是否存在高效的工具和预训练模型以加速 reBEN 上的 DL 实验?
主要发现
- reBEN 在 sen2cor 2.11 大气校正和排除不通过质量检查的图块方面实现了更高质量的补丁。
- 来自最新 CLC2018 地图的像素级参考图解决 BigEarthNet 中存在的标签噪声。
- 基于地理的分割比 BigEarthNet 的网格划分减少了划分之间的空间相关性,从而提高评估的可靠性。
- 联合使用 Sentinel-1 和 Sentinel-2 通常比单一模态在各模型上获得更高的性能。
- 在评估的模型中,ResNet 变体和多模态(S1+S2)配置通常在 reBEN 上达到最佳的 AP 和 F1 分数。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。