Skip to main content
QUICK REVIEW

[論文レビュー] Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery

Jamie Sherrah|arXiv (Cornell University)|Jun 8, 2016
Remote Sensing and LiDAR Applications参考文献 18被引用数 270
ひとこと要約

この論文は完全畳み込みネットワーク(FCN)を高解像度の航空画像の密な意味的ラベリングに適用し、全解像度を保持するダウンサンプリングなしのFCNを導入、事前学習済み特徴とDSMベースのハイブリッドボクセルアーキテクチャを用いてISPRS Vaihingenおよび Potsdam データセットで最先端の結果を示す。

ABSTRACT

The trend towards higher resolution remote sensing imagery facilitates a transition from land-use classification to object-level scene understanding. Rather than relying purely on spectral content, appearance-based image features come into play. In this work, deep convolutional neural networks (CNNs) are applied to semantic labelling of high-resolution remote sensing data. Recent advances in fully convolutional networks (FCNs) are adapted to overhead data and shown to be as effective as in other domains. A full-resolution labelling is inferred using a deep FCN with no downsampling, obviating the need for deconvolution or interpolation. To make better use of image features, a pre-trained CNN is fine-tuned on remote sensing data in a hybrid network context, resulting in superior results compared to a network trained from scratch. The proposed approach is applied to the problem of labelling high-resolution aerial imagery, where fine boundary detail is important. The dense labelling yields state-of-the-art accuracy for the ISPRS Vaihingen and Potsdam benchmark data sets.

研究の動機と目的

  • Demonstrate the effectiveness of fully convolutional networks for dense semantic labelling of high-resolution overhead imagery.
  • Preserve full spatial resolution without downsampling to improve boundary accuracy.
  • Leverage pre-trained CNN features and elevation data to boost labeling performance on aerial datasets.

提案手法

  • Transform fully-connected layers into convolutional layers to create an FCN that operates as an image filter.
  • Introduce a no-downsampling FCN by using atrous (dilated) convolutions to expand receptive field without reducing resolution.
  • Propose a hybrid network that combines pre-trained CNN features with DSM/elevation data trained from scratch.
  • Train FCNs on tiles to handle large overhead images and enable full-resolution output.
  • Compare patch-based training to FCN training and analyze the impact on boundary accuracy and training efficiency.
  • Evaluate on ISPRS Vaihingen and Potsdam datasets with rotation augmentations and leadership-trace results.

実験結果

リサーチクエスチョン

  • RQ1Can FCNs provide dense, full-resolution semantic labelling for very-high-resolution aerial imagery without downsampling?
  • RQ2Does a no-downsampling FCN improve boundary delineation and overall accuracy compared with traditional downsampling FCNs?
  • RQ3Do pre-trained visual features plus elevation/DSM data improve semantic labelling in aerial datasets?
  • RQ4What is the impact of data augmentation and network depth on FCN performance for aerial imagery?

主な発見

  • FCN training significantly improves accuracy over patch-based training (e.g., Vaihingen: Overall Acc. up to 87.17% with 36-rotation augmentation).
  • No-downsampling FCN training yields gains over downsampling variants, notably improving car and boundary delineation (Vaihingen: car F1/Acc. up to 66.54%/76.77%).
  • Hybrid architectures combining pre-trained image features with DSM features further boost results on high-resolution Potsdam data (cars accuracy improved; Potsdam gains noted).
  • On Vaihingen, no-downsampling with DST_2 (RF+CRF) achieves 87.90% overall accuracy on validation, and DST_2 reaches 89.1% on the ISPRS leaderboard for test data.
  • For Potsdam, no-downsampling improves accuracy, with car class showing substantial gains (e.g., 90.28% Unknown/Car metrics in validation).
  • The no-downsampling approach reduces boundary artefacts and improves pixel-wise labelling at full resolution compared to conventional interpolation-based restoration.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。