QUICK REVIEW

[論文レビュー] Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery

Jamie Sherrah|arXiv (Cornell University)|Jun 8, 2016

Remote Sensing and LiDAR Applications参考文献 18被引用数 270

ひとこと要約

この論文は完全畳み込みネットワーク（FCN）を高解像度の航空画像の密な意味的ラベリングに適用し、全解像度を保持するダウンサンプリングなしのFCNを導入、事前学習済み特徴とDSMベースのハイブリッドボクセルアーキテクチャを用いてISPRS Vaihingenおよび Potsdam データセットで最先端の結果を示す。

ABSTRACT

The trend towards higher resolution remote sensing imagery facilitates a transition from land-use classification to object-level scene understanding. Rather than relying purely on spectral content, appearance-based image features come into play. In this work, deep convolutional neural networks (CNNs) are applied to semantic labelling of high-resolution remote sensing data. Recent advances in fully convolutional networks (FCNs) are adapted to overhead data and shown to be as effective as in other domains. A full-resolution labelling is inferred using a deep FCN with no downsampling, obviating the need for deconvolution or interpolation. To make better use of image features, a pre-trained CNN is fine-tuned on remote sensing data in a hybrid network context, resulting in superior results compared to a network trained from scratch. The proposed approach is applied to the problem of labelling high-resolution aerial imagery, where fine boundary detail is important. The dense labelling yields state-of-the-art accuracy for the ISPRS Vaihingen and Potsdam benchmark data sets.

研究の動機と目的

Demonstrate the effectiveness of fully convolutional networks for dense semantic labelling of high-resolution overhead imagery.
Preserve full spatial resolution without downsampling to improve boundary accuracy.
Leverage pre-trained CNN features and elevation data to boost labeling performance on aerial datasets.

提案手法

Transform fully-connected layers into convolutional layers to create an FCN that operates as an image filter.
Introduce a no-downsampling FCN by using atrous (dilated) convolutions to expand receptive field without reducing resolution.
Propose a hybrid network that combines pre-trained CNN features with DSM/elevation data trained from scratch.
Train FCNs on tiles to handle large overhead images and enable full-resolution output.
Compare patch-based training to FCN training and analyze the impact on boundary accuracy and training efficiency.
Evaluate on ISPRS Vaihingen and Potsdam datasets with rotation augmentations and leadership-trace results.

実験結果

リサーチクエスチョン

RQ1Can FCNs provide dense, full-resolution semantic labelling for very-high-resolution aerial imagery without downsampling?
RQ2Does a no-downsampling FCN improve boundary delineation and overall accuracy compared with traditional downsampling FCNs?
RQ3Do pre-trained visual features plus elevation/DSM data improve semantic labelling in aerial datasets?
RQ4What is the impact of data augmentation and network depth on FCN performance for aerial imagery?

主な発見

FCN training significantly improves accuracy over patch-based training (e.g., Vaihingen: Overall Acc. up to 87.17% with 36-rotation augmentation).
No-downsampling FCN training yields gains over downsampling variants, notably improving car and boundary delineation (Vaihingen: car F1/Acc. up to 66.54%/76.77%).
Hybrid architectures combining pre-trained image features with DSM features further boost results on high-resolution Potsdam data (cars accuracy improved; Potsdam gains noted).
On Vaihingen, no-downsampling with DST_2 (RF+CRF) achieves 87.90% overall accuracy on validation, and DST_2 reaches 89.1% on the ISPRS leaderboard for test data.
For Potsdam, no-downsampling improves accuracy, with car class showing substantial gains (e.g., 90.28% Unknown/Car metrics in validation).
The no-downsampling approach reduces boundary artefacts and improves pixel-wise labelling at full resolution compared to conventional interpolation-based restoration.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。