QUICK REVIEW

[论文解读] Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights

Ibrahim Fayad, Philippe Ciais|arXiv (Cornell University)|Apr 22, 2023

Remote Sensing and LiDAR Applications参考文献 80被引用 8

一句话总结

该论文引入一种视觉变换器模型，通过对高达10 m分辨率的加盖高度测定，在加纳实现对高树高度的估计，采用离散/连续损失以提升高树高度估计并超过ConvNet基线（RMSE 3.12 m vs 4.3 m）。

ABSTRACT

Accurate and timely monitoring of forest canopy heights is critical for assessing forest dynamics, biodiversity, carbon sequestration as well as forest degradation and deforestation. Recent advances in deep learning techniques, coupled with the vast amount of spaceborne remote sensing data offer an unprecedented opportunity to map canopy height at high spatial and temporal resolutions. Current techniques for wall-to-wall canopy height mapping correlate remotely sensed 2D information from optical and radar sensors to the vertical structure of trees using LiDAR measurements. While studies using deep learning algorithms have shown promising performances for the accurate mapping of canopy heights, they have limitations due to the type of architectures and loss functions employed. Moreover, mapping canopy heights over tropical forests remains poorly studied, and the accurate height estimation of tall canopies is a challenge due to signal saturation from optical and radar sensors, persistent cloud covers and sometimes the limited penetration capabilities of LiDARs. Here, we map heights at 10 m resolution across the diverse landscape of Ghana with a new vision transformer (ViT) model optimized concurrently with a classification (discrete) and a regression (continuous) loss function. This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function. The ViT model results show that our proposed discrete/continuous loss significantly increases the sensitivity for very tall trees (i.e., > 35m), for which other approaches show saturation effects. The height maps generated by the ViT also have better ground sampling distance and better sensitivity to sparse vegetation in comparison to a convolutional model. Our ViT model has a RMSE of 3.12m in comparison to a reference dataset while the ConvNet model has a RMSE of 4.3m.

研究动机与目标

通过视觉变换器(ViT)实现覆盖全球的冠层高度制图，以在 diverse tropical landscapes 取得高空间分辨率。
解决ConvNets在捕捉极高树冠和稀疏植被方面的局限性。
评估联合离散(分类)与连续(回归)损失以改善高度估计。
在热带数据集上展示相对于传统卷积结构的性能提升。

提出的方法

开发一个以联合离散与连续损失函数优化的视觉变换器模型。
将模型应用于Ghana 10 m 分辨率的冠层高度制图。
将ViT结果与卷积神经网络(ConvNet)基线进行比较。
评估对极高树木(>35 m)和稀疏植被的敏感性。

实验结果

研究问题

RQ1Vision Transformer结合离散/连续损失是否能在高分辨率下改进冠层高度制图，相较于ConvNets？
RQ2联合损失函数是否提升对极高树冠(>35 m)与稀疏植被的准确性？
RQ3ViT在热带景观中的地面采样距离和敏感性方面的表现如何？

主要发现

具有离散/连续损失的ViT在参考数据集上的RMSE为3.12 m。
ConvNet基线的RMSE为4.3 m。
ViT在地面采样距离和对稀疏植被的敏感性方面优于ConvNet。
离散损失提升对极高树木(>35 m)的敏感性并减少饱和。
ViT在加纳的高分辨率冠层高度制图中优于先前的2D架构。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。