QUICK REVIEW

[论文解读] High-Resolution Building and Road Detection from Sentinel-2

Wojciech Sirko, Emmanuel Asiedu Brempong|arXiv (Cornell University)|Oct 17, 2023

Automated Road and Building Extraction被引用 14

一句话总结

该论文通过模仿高分辨率教师模型，训练学生模型以从10张 Sentinel-2 图像堆叠中预测 50 cm 建筑/道路分割，在建筑物 mIoU 达到 78.3%（教师为 85.3%），并以 R2=0.91 的相关性进行建筑物计数。

ABSTRACT

Mapping buildings and roads automatically with remote sensing typically requires high-resolution imagery, which is expensive to obtain and often sparsely available. In this work we demonstrate how multiple 10 m resolution Sentinel-2 images can be used to generate 50 cm resolution building and road segmentation masks. This is done by training a `student' model with access to Sentinel-2 images to reproduce the predictions of a `teacher' model which has access to corresponding high-resolution imagery. While the predictions do not have all the fine detail of the teacher model, we find that we are able to retain much of the performance: for building segmentation we achieve 79.0\% mIoU, compared to the high-resolution teacher model accuracy of 85.5\% mIoU. We also describe two related methods that work on Sentinel-2 imagery: one for counting individual buildings which achieves $R^2 = 0.91$ against true counts and one for predicting building height with 1.5 meter mean absolute error. This work opens up new possibilities for using freely available Sentinel-2 imagery for a range of tasks that previously could only be done with high-resolution satellite imagery.

研究动机与目标

促进在不依赖昂贵的高分辨率影像的情况下，对建筑和道路进行可获取的大规模制图。
开发一个端到端框架，学习从低分辨率 Sentinel-2 堆栈再现高分辨率预测。
利用以 50 cm 图像训练的教师模型来监督使用 Sentinel-2 输入的学生模型。
通过质心预测实现对一个贴片内建筑物的计数，以近似计数。
通过扩展基于 Sentinel-2 的分析能力，推动 Open Buildings 数据集的发展。

提出的方法

采用教师–学生设置，教师在 50 cm 图像上工作，学生接收 10 m 分辨率的 Sentinel-2 帧堆栈，以预测高分辨率语义掩模。
采用基于 HRNet 的编码器–解码器架构；将第一块进行改造，以在低分辨率输入下保持更高的空间分辨率。
通过残差配置中的跨时序深度卷积，将 32 帧 Sentinel-2 的时间信息融合，以捕捉时间线索。
训练一个多任务模型，输出建筑分割、道路分割、建筑质心（用于计数），以及用于配准的超分辨灰度图像。
使用逐像素的 Kullback–Leibler 发散损失，并包含通过平移搜索对齐标签与模型输出的对齐步骤；包含基于上采样的解码器以达到 50 cm 目标。
通过基于质心的计数方法来激励准确计数，通过对质心通道输出求和并缩放来推导图块计数。

Figure 1: Example operation of our model, where multiple frames of low-resolution Sentinel-2 imagery are used to make a single frame of high-resolution predictions for a variety of output types. A high-resolution image of the same scene is shown for comparison.

实验结果

研究问题

RQ1是否可以使用 10 m 分辨率的 Sentinel-2 堆栈来预测语义化的 50 cm 建筑和道路掩模？
RQ2基于 Sentinel-2 的预测在 mIoU 和空间细节上能接近高分辨率教师模型到何种程度？
RQ3从 Sentinel-2 预测中进行建筑计数是否可行？与真实计数相比如何？
RQ4时间框架与配对策略对下游分割性能有何影响？
RQ5输入/输出/标签分辨率如何影响性能，以及基于 Sentinel-2 的计数在不同尺度上的表现？

主要发现

在 Sentinel-2 监督下建筑分割达到 78.3% mIoU，高分辨率教师为 85.3% mIoU。
建筑计数任务对真实计数的 R^2 为 0.91，接近教师基线的 R^2 = 0.95。
使用 4 m 输入分辨率的性能可与在 4 m 数据上训练的单帧高分辨率模型相当，且最佳的 Sentinel-2 基模型达到 50 cm 输出并具有显著准确性。
性能随时间帧数增加而提升；32 帧堆栈在建筑物的 mIoU 上比单帧设置高约 5 个点。
将每个时间帧与第 17 帧配对（与教师标签时间最接近的帧）相比无配对有显著提升；跨时序融合进一步提升结果。
训练规模很重要：将训练数据从 1% 提升到 100%，使建筑物 mIoU 从 69.1 提升到 76.6（32 帧）。

Figure 2: Examples of building and road detection from Sentinel-2 imagery, each covering an area of $192^{2}$ m 2 . The panels on the left show high-resolution satellite imagery of the scene for comparison; although Sentinel-2 imagery has much lower level of detail in each frame, we are able to pred

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。