QUICK REVIEW

[논문 리뷰] IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network

Lichao Mou, Xiao Xiang Zhu|arXiv (Cornell University)|2018. 02. 28.

Remote Sensing and LiDAR Applications참고 문헌 33인용 수 101

한 줄 요약

이 논문은 단일 단안 원격 sensing 이미지에서 높이 맵을 예측하기 위해 완전 잔차(convolutional-deconvolutional) 네트워크를 제안하고, DSM을 ground truth로 사용하며, 에지 보존을 위한 스킵 연결를 활용한다. 베를린 데이터에서 바깥 기본 대비 정량적 개선과 빌딩 인스턴스 세분화에의 응용을 평가한다.

ABSTRACT

In this paper we tackle a very novel problem, namely height estimation from a single monocular remote sensing image, which is inherently ambiguous, and a technically ill-posed problem, with a large source of uncertainty coming from the overall scale. We propose a fully convolutional-deconvolutional network architecture being trained end-to-end, encompassing residual learning, to model the ambiguous mapping between monocular remote sensing images and height maps. Specifically, it is composed of two parts, i.e., convolutional sub-network and deconvolutional sub-network. The former corresponds to feature extractor that transforms the input remote sensing image to high-level multidimensional feature representation, whereas the latter plays the role of a height generator that produces height map from the feature extracted from the convolutional sub-network. Moreover, to preserve fine edge details of estimated height maps, we introduce a skip connection to the network, which is able to shuttle low-level visual information, e.g., object boundaries and edges, directly across the network. To demonstrate the usefulness of single-view height prediction, we show a practical example of instance segmentation of buildings using estimated height map. This paper, for the first time in the remote sensing community, attempts to estimate height from monocular vision. The proposed network is validated using a large-scale high resolution aerial image data set covered an area of Berlin. Both visual and quantitative analysis of the experimental results demonstrate the effectiveness of our approach.

연구 동기 및 목표

단일 단안 원격 sensing 이미지에서 높이 추정은 본질적으로 ill-posed 문제임을 다룬다.
DSM을 Ground Truth로 사용하고 RGB 이미지를 높이 맵으로 매핑하는 완전한 엔드-투-엔드 잔차 conv-deconv 네트워크를 개발한다.
스킵 연결을 통해 높이 맵의 에지 세부 정보를 보존하고 실용적 유용성(예: 건물 인스턴스 세분화)을 평가한다.

제안 방법

특징 추출용 컨볼루션 서브네트워크와 높이 생성기로서의 디컨볼루션 서브네트워크의 이황 구성이다.
잔차 블록을 채택하여 잔차 학습과 최적화의 용이성을 확보한다.
네트워크의 첫 블록과 뒤에서 두 번째 마지막 블록 사이에 스킵 연결을 도입하여 저수준 에지 정보를 네트워크 전체에 전달한다.
디컨볼루션 경로에서 맥스풀링 인덱스를 이용한 언풀링으로 공간 정보를 더 잘 보존한다.
RGB 입력에 대해 최종적으로 높은 해상도 높이 맵을 예측하도록 엔드-투-엔드로 학습하며, 데이터 증강과 매우 작은 배치 크기를 사용한다.
에지 보존과 높이 맵 품질을 평가하기 위해 Eigen-Net 및 일반 conv-deconv 기준선을 비교한다.

실험 결과

연구 질문

RQ1단일 단안 원격 sensing 이미지에서 물리적으로 그럴듯한 높이 맵으로의 맵핑이 엔드-투-엔드 딥 네트워크를 통해 가능한가?
RQ2잔차 학습이 있는 컨볼루션-디컨볼루션 아키텍처가 비잔차 또는 일반 아키텍처에 비해 높이 맵 정확도와 에지 보존을 개선하는가?
RQ3제안 method가 고해상도 베를린 데이터셋에서 기존의 단일 이미지 높이 추정 방법(Eigen-Net 등)과 어떻게 비교되는가?
RQ4예측된 높이 맵이 건물 인스턴스 세분화와 같은 하류 작업에 유용한가?

주요 결과

접근법	MSE	MAE	SSIM
res. conv-deconv net	3.1e-03	2.7e-02	0.8060
net with skip connection	7.8e-04	1.7e-02	0.9366

베를린 테스트 씬에서 스킵 연결이 있는 잔차 컨볼루션-디컨볼루션 네트가 MSE 7.8e-04, MAE 0.017, SSIM 0.9366으로 바탕선보다 우수하게 성능을 달성한다.
일반 컨볼루션-디컨볼루션 네트워크는 학습이 잘 되지 않고 현실적인 높이 맵을 학습하지 못하는 반면, 스킵 연결이 있는 잔차 네트워크는 에지 보존 측면에서 현저한 향상을 보인다.
Eigen-Net(블러링되며 저해상도 예측)을 대조하면 제안 방법은 포스트-프로세싱 없이도 더 높은 품질의 에지 보존 높이 맵을 제공한다.
정성적 결과는 다양한 토지 이용 씬에서 물체 경계가 더 뚜렷하고 높이 맵의 구조적 디테일이 더 잘 나타남을 보여준다.
예측된 높이 맵에서 건물을 임계값과 식생 필터링으로 분할하는 애플리케이션 시연이 수행되며 픽셀 수준의 감독이 필요하지 않다.
높은 높이의 건물에 대해 여전히 일부 실패 사례가 있어 고층 높이 추정의 남은 과제가 있음을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.