QUICK REVIEW

[論文レビュー] Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights

Ibrahim Fayad, Philippe Ciais|arXiv (Cornell University)|Apr 22, 2023

Remote Sensing and LiDAR Applications参考文献 80被引用数 8

ひとこと要約

要点: 本論文は、ガーナの樹冠高を10 m解像度でマッピングするビジョン・トランスフォーマー（ViT）モデルを提案し、離散/連続損失を用いて高木の高さ推定を改善し、ConvNetベースラインを上回る（RMSE 3.12 m 対 4.3 m）ことを示す。

ABSTRACT

Accurate and timely monitoring of forest canopy heights is critical for assessing forest dynamics, biodiversity, carbon sequestration as well as forest degradation and deforestation. Recent advances in deep learning techniques, coupled with the vast amount of spaceborne remote sensing data offer an unprecedented opportunity to map canopy height at high spatial and temporal resolutions. Current techniques for wall-to-wall canopy height mapping correlate remotely sensed 2D information from optical and radar sensors to the vertical structure of trees using LiDAR measurements. While studies using deep learning algorithms have shown promising performances for the accurate mapping of canopy heights, they have limitations due to the type of architectures and loss functions employed. Moreover, mapping canopy heights over tropical forests remains poorly studied, and the accurate height estimation of tall canopies is a challenge due to signal saturation from optical and radar sensors, persistent cloud covers and sometimes the limited penetration capabilities of LiDARs. Here, we map heights at 10 m resolution across the diverse landscape of Ghana with a new vision transformer (ViT) model optimized concurrently with a classification (discrete) and a regression (continuous) loss function. This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function. The ViT model results show that our proposed discrete/continuous loss significantly increases the sensitivity for very tall trees (i.e., > 35m), for which other approaches show saturation effects. The height maps generated by the ViT also have better ground sampling distance and better sensitivity to sparse vegetation in comparison to a convolutional model. Our ViT model has a RMSE of 3.12m in comparison to a reference dataset while the ConvNet model has a RMSE of 4.3m.

研究の動機と目的

ビジョン・トランスフォーマー（ViT）を用いた壁から壁までの樹冠高マッピングを高度な空間解像度で多様な熱帯景観に適用する。
ConvNetが非常に高い樹冠および希薄な植生を捉える際の限界に対処する。
高さ推定を改善する離散（分類）と連続（回帰）の共同損失を評価する。
熱帯データセットにおける従来の畳み込みアーキテクチャに対する性能向上を示す。

提案手法

離散と連続の共同損失関数で最適化されたビジョン・トランスフォーマーモデルを開発する。
モデルをガーナ全域の10 m解像度の樹冠高マッピングに適用する。
ViTの結果を畳み込みニューラルネットワーク（ConvNet）ベースラインと比較する。
非常に高い木（>35 m）および sparse vegetationに対する感度を評価する。

実験結果

リサーチクエスチョン

RQ1離散/連続損失を持つVision TransformerはConvNetと比較して高解像度での樹冠高マッピングを改善できるか。
RQ2共同損失関数は高木（>35 m）および希薄な植生の精度を高めるか。
RQ3ViTは地表サンプリング距離と熱帯景観全体の感度の点でどうなるか。

主な発見

離散/連続損失を用いたViTは基準データセットでRMSE 3.12 mを達成。
ConvNetベースラインはRMSE 4.3 m。
ViTはConvNetより地表サンプリング距離が優れ、希薄な植生への感度が高い。
離散損失は非常に高い木（>35 m）に対する感度を改善し飽和を抑制。
ViTはガーナにおける高解像度樹冠高マッピングで従来の2Dアーキテクチャより優れている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。