QUICK REVIEW

[논문 리뷰] PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection

Nian Liu, Junwei Han|arXiv (Cornell University)|2017. 08. 21.

Visual Attention and Saliency Detection참고 문헌 11인용 수 51

한 줄 요약

PiCANet은 각 픽셀에 대한 픽셀 수준의 맥락 주의를 학습하여 각 픽셀의 맥락 정보를 선택적으로 가중치를 부여하고, U-Net과 같은 CNN에 통합될 때 글로벌 및 로컬 맥락 형태를 사용하여 saliency 탐지를 개선합니다.

ABSTRACT

Contexts play an important role in the saliency detection task. However, given a context region, not all contextual information is helpful for the final task. In this paper, we propose a novel pixel-wise contextual attention network, i.e., the PiCANet, to learn to selectively attend to informative context locations for each pixel. Specifically, for each pixel, it can generate an attention map in which each attention weight corresponds to the contextual relevance at each context location. An attended contextual feature can then be constructed by selectively aggregating the contextual information. We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively. Both models are fully differentiable and can be embedded into CNNs for joint training. We also incorporate the proposed models with the U-Net architecture to detect salient objects. Extensive experiments show that the proposed PiCANets can consistently improve saliency detection performance. The global and local PiCANets facilitate learning global contrast and homogeneousness, respectively. As a result, our saliency model can detect salient objects more accurately and uniformly, thus performing favorably against the state-of-the-art methods.

연구 동기 및 목표

주목도 탐지를 위한 맥락 정보의 사용을 동기부여하고 모든 맥락이 동등하게 유용하지 않다는 점을 인정한다.
픽셀 단위의 맥락 주의 메커니즘을 도입하여 각 픽셀별로 정보량이 많은 맥락 위치에 주의를 학습하도록 한다.
전역 맥락을 포착하는 전역 PiCANet 및 로컬 맥 context를 포착하는 로컬 PiCANet 변형을 제안한다.
PiCANet을 U-Net과 같은 CNN 아키텍처에 삽입하여 엔드-투-엔드 학습을 가능하게 한다.
최신 연구의 saliency 방법들에 비해 일관된 성능 향상을 입증한다.

제안 방법

PiCANet을 모든 맥락 위치에 대해 각 픽셀에 대한 주의 맵을 출력하는 모듈로 정의한다.
픽셀-단위 주의 가중치를 이용한 가중합을 통해 주의된 맥락 특징을 계산한다.
전역 맥락을 위한 global PiCANet과 로컬 맥 context를 위한 local PiCANet의 두 가지 변형을 제시한다.
모듈이 완전히 미분 가능하고 공동 학습을 위해 CNN에 통합될 수 있도록 보장한다.
U-Net 아키텍처와 함께 PiCANet을 도입하여 눈에 띄는 객체를 탐지한다.
광범위한 실험을 통해 PiCANet이 saliency 탐지를 향상시키고 글로벌 대비와 균일한 특성 학습을 돕는다는 것을 보인다.

실험 결과

연구 질문

RQ1픽셀-단위 맥락 주의가 정보가 풍부한 맥락 위치에 선택적으로 주의를 기울임으로써 saliency 탐지를 향상시킬 수 있는가?
RQ2전역 PiCANet 변형과 로컬 PiCANet 변형이 saliency 성능에 어떻게 기여하는가?
RQ3PiCANet을 U-Net과 통합하면 saliency 정확도와 saliency 맵의 균일성이 향상되는가?
RQ4PiCANet은 표준 CNN 백본 내에서 미분 가능하고 엔드투앤드 학습이 가능한가?

주요 결과

PiCANet은 각 픽셀에서 맥락 관련성을 반영하는 주의 가중치를 학습한다.
Global PiCANet은 글로벌 대비 학습을 촉진하는 반면 Local PiCANet은 saliency 맵의 균일성에 기여한다.
PiCANet-가 강화된 모델은 일관되게 saliency 탐지 성능을 향상시킨다.
PiCANet을 U-Net과 통합하면 최첨단 방법들에 대해 유리한 결과를 얻는다.
전역 PiCANet과 로컬 PiCANets 두 가지 모두 엔드-투-엔드 CNN 프레임워크 내에서 완전히 미분 가능하고 학습 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.