QUICK REVIEW

[논문 리뷰] Towards High-Resolution Salient Object Detection

Yi Zeng, Pingping Zhang|arXiv (Cornell University)|2019. 08. 20.

Visual Attention and Saliency Detection참고 문헌 53인용 수 37

한 줄 요약

본 논문은 최초의 고해상도 주목 객체 탐지 데이터셋(HRSOD)과 세 가지 가지 분기 네트워크(GSN, LRN, GLFN)를 도입하여 포스트-처리 없이 매우 고해상도 영상에서 직접 주목 객체를 탐지합니다. 또한 HRSOD에서 최첨단 성능을 보이고 표준 저해상도 벤치마크에서도 경쟁력 있는 결과를 보여줍니다.

ABSTRACT

Deep neural network based methods have made a significant breakthrough in salient object detection. However, they are typically limited to input images with low resolutions ($400 imes400$ pixels or less). Little effort has been made to train deep neural networks to directly handle salient object detection in very high-resolution images. This paper pushes forward high-resolution saliency detection, and contributes a new dataset, named High-Resolution Salient Object Detection (HRSOD). To our best knowledge, HRSOD is the first high-resolution saliency detection dataset to date. As another contribution, we also propose a novel approach, which incorporates both global semantic information and local high-resolution details, to address this challenging task. More specifically, our approach consists of a Global Semantic Network (GSN), a Local Refinement Network (LRN) and a Global-Local Fusion Network (GLFN). GSN extracts the global semantic information based on down-sampled entire image. Guided by the results of GSN, LRN focuses on some local regions and progressively produces high-resolution predictions. GLFN is further proposed to enforce spatial consistency and boost performance. Experiments illustrate that our method outperforms existing state-of-the-art methods on high-resolution saliency datasets by a large margin, and achieves comparable or even better performance than them on widely-used saliency benchmarks. The HRSOD dataset is available at https://github.com/yi94code/HRSOD.

연구 동기 및 목표

매우 고해상도 이미지에서 직접 학습 및 추론을 가능하게 하여 고해상도 주목 객체 탐지의 격차를 해소한다.
연구를 촉진하기 위해 크고 풍부한 주석이 달린 고해상도 데이터셋(HRSOD)을 제공한다.
전역-로컬 아키텍처 패러다임을 제안하여 글로벌 맥락을 활용하면서도 고해상도 디테일을 보존한다.

제안 방법

three-branch 아키텍처를 도입한다: Global Semantic Network (GSN)으로 거친 전역 주목을, Local Refinement Network (LRN)으로 고해상도 지역 정교화를, Global-Local Fusion Network (GLFN)으로 고해상도 융합과 공간 일관성을 위한 구성이다.
전역 시맨틱을 포착하기 위해 GSN에 대해 다운샘플 입력을 사용하고, LRN 정교화를 위한 불확실한 영역을 선택하기 위해 Attended Patch Sampling (APS)을 사용한다.
LRN에 대응하는 GSN 피처를 LRN 디코더 경로와 연결(concatenating)하여 GSN의 시맨틱 가이던스를 LRN에 통합한다.
세부 정보를 보존하면서 GSN/LRN 출력과 고해상도 입력을 융합하기 위해 Densely connected convolutions를 갖춘 경량 GLFN을 학습시킨다.
LRN을 GSN 출력에 의해 안내되는 불확실한 영역에 집중시키기 위한 Attended Patch Sampling (APS)을 제안한다.
post-processing 정제와 비교하기 위한 선택적 GSN+APS+LRN+CRF 변형을 제공한다.

실험 결과

연구 질문

RQ1고해상도 주목이 네트워크에 의해 포스트-처리 없이 직접 학습될 수 있는가?
RQ2전역 시맨틱 가이던스가 주목 탐지의 고해상도 지역 정교화에 도움을 주는가?
RQ3APS를 통해 불확실한 영역에 정교화를 집중하는 것이 균일한 패치 샘플링보다 효과적인가?
RQ4제안된 Global-Local Fusion Network (GLFN)이 고해상도 디테일과 공간 일관성을 얼마나 잘 보존하는가?
RQ5제안 방법이 고해상도 데이터셋(HRSOD)에서와 표준 저해상도 주목 벤치마크에서 얼마나 잘 작동하는가?

주요 결과

제안된 방법이 새로born된 고해상도 데이터셋 HRSOD에서 최첨단 방법보다 큰 차이로 성능을 능가한다.
이 접근법은 널리 사용되는 저해상도 주목 벤치마크에서 최첨단 방법과 동등하거나 더 나은 성능을 달성한다.
APS는 무작위 패치 샘플링에 비해 정교화를 크게 향상시키며 패치 수에 강건하다.
GLFN은 매우 작은 모델 크기(11.9 KB)로 강력한 고해상도 융합과 고해상도 입력에 대한 빠른 추론을 제공한다.
CRF 기반 post-processing과 비교했을 때, LRN+APS+GLFN은 경계 품질(경계 변위 오차가 더 낮음)이 더 좋다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.