QUICK REVIEW

[논문 리뷰] RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images

Lichao Mou, Xiao Xiang Zhu|arXiv (Cornell University)|2018. 05. 05.

Advanced Neural Network Applications참고 문헌 40인용 수 66

한 줄 요약

RiFCN은 다중 레벨 CNN 특징을 결합하는 양방향, 재귀적으로 융합된 특징 네트워크를 도입하여 고해상도 원격탑사 영상의 픽셀 단위 의미론적 분할을 향상시키며 ISPRS Potsdam 및 Inria 데이터셋에서 FCN 및 SegNet보다 향상된 지표를 보임.

ABSTRACT

Semantic segmentation in high resolution remote sensing images is a fundamental and challenging task. Convolutional neural networks (CNNs), such as fully convolutional network (FCN) and SegNet, have shown outstanding performance in many segmentation tasks. One key pillar of these successes is mining useful information from features in convolutional layers for producing high resolution segmentation maps. For example, FCN nonlinearly combines high-level features extracted from last convolutional layers; whereas SegNet utilizes a deconvolutional network which takes as input only coarse, high-level feature maps of the last convolutional layer. However, how to better fuse multi-level convolutional feature maps for semantic segmentation of remote sensing images is underexplored. In this work, we propose a novel bidirectional network called recurrent network in fully convolutional network (RiFCN), which is end-to-end trainable. It has a forward stream and a backward stream. The former is a classification CNN architecture for feature extraction, which takes an input image and produces multi-level convolutional feature maps from shallow to deep; while in the later, to achieve accurate boundary inference and semantic segmentation, boundary-aware high resolution feature maps in shallower layers and high-level but low-resolution features are recursively embedded into the learning framework (from deep to shallow) to generate a fused feature representation that draws a holistic picture of not only high-level semantic information but also low-level fine-grained details. Experimental results on two widely-used high resolution remote sensing data sets for semantic segmentation tasks, ISPRS Potsdam and Inria Aerial Image Labeling Data Set, demonstrate competitive performance obtained by the proposed methodology compared to other studied approaches.

연구 동기 및 목표

고해상도 원격탐사 영상에서 경계선 정확화를 위한 다중 레벨 CNN 특징의 융합 개선 동기 부여.
전방 특징 추출기와 후방 재귀 융합 스트림이 있는 양방향 RiFCN 아키텍처를 제안.
네트워크 전체를 엔드 투 엔드로 학습시켜 픽셀 단위 의미론적 분할 성능을 향상시키기.

제안 방법

전방 스트림: 3x3 합성곱과 2x2 최대풀링을 사용하는 5-블록 CNN(VGG-16 스타일)으로 다중 레벨 특징 맵을 출력하되 패딩과 ReLU 활성화를 통해 해상도 보존.
후방 스트림: 고수준 특징을 상향식으로 깊은 층 내로 통합하는 재귀적 자동회귀 융합 프로세스(업샘플링과 융합을 위한 변형 가능한 디컨볼루션 유사 Φ 기능 사용).
수식 기반 융합: F_bwd^l = Φ(F_fwd^l, F_bwd^{l+1})로 Φ는 순방향 경로의 컨볼루션 항과 디컨볼루션 항을 결합; 역전파 기울기는 다층 축적(Eq. 6) 및 모멘텀 업데이트(Eq. 7)를 따른다.
손실: M 클래스에 대해 픽셀 단위 교차 엔트로피, 순방향 및 역방향 스트림 파라미터(W, W_fwd, W_bwd)에 조건화.
학습: TensorFlow에서 Nesterov Adam, 소규모 배치, 데이터 증강, 조기 중지, 30 에폭으로 엔드투엔드 학습 수행

실험 결과

연구 질문

RQ1양방향 네트워크가 재귀적 후방 스트림을 통해 모든 레벨의 특징을 융합하면 고해상도 원격탐사 영상의 경계 보존 의미론적 분할이 향상되는가?
RQ2자기회귀식 상향식 특징 융합이 미세한 세부 정보를 보존하면서도 높은 수준의 의미 해석 정확성을 유지하는 데 도움이 되는가?
RQ3RiFCN이 표준 FCN 및 SegNet과 벤치마크 고해상도 원격탐사 데이터셋에서 어떤 차이를 보이는가?
RQ4평가에 대해 침식된 경계 GT를 사용할 때 접근법이 견고한가?
RQ5항공 영상에서 건물, 도로 및 소형 물체 클래스에 대한 정성적 및 정량적 이득은 무엇인가?

주요 결과

RiFCN은 ISPRS Potsdam 데이터셋에서 FCN 및 SegNet보다 평균 F1 및 Overall Accuracy에서 우수한 성능을 보임(RiFCN: 83.70 OA; RiFCN[e]: 86.05 OA vs FCN 80.76 및 SegNet 80.64).
RiFCN은 자동차와 같은 소형 물체를 포함한 클래스별 점수에서도 더 높음(RiFCN: Cars의 평균 88.91; RiFCN[e]: 93.73).
FCN 및 SegNet에 비해 RiFCN 및 RiFCN[e]가 대부분의 토지 피복 범주에서 일관되게 개선을 보임(불투골/불투과 면적, Buildings, Low Veg, Trees, Cars, Clutter).
Inria 항공 영상 라벨링 데이터셋에서 RiFCN은 IoU 및 전체 정확도에서 여러 베이스라인보다 경쟁력 있는 성능을 보이며 SegNet 및 FCN 변형보다 높음(RiFCN IoU/Acc: 74.00/95.82 전체, RiFCN[e] ???).
역방향 재귀 융합은 깊은 계층에서 얕은 계층으로의 다경로 정보 흐름을 가능하게 하여 경계 구분 및 의미 일관성을 향상시킴으로써 정성적 결과에 나타난 바를 보임.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.