QUICK REVIEW

[논문 리뷰] OCNet: Object Context Network for Scene Parsing

Yuhui Yuan, Jingdong Wang|arXiv (Cornell University)|2018. 09. 04.

Advanced Image and Video Retrieval Techniques참고 문헌 70인용 수 516

한 줄 요약

OCNet은 객체 중심 컨텍스트 집계 메커니즘을 도입하여 의미론적 분할을 수행하고, dense 또는 interlaced sparse self-attention을 사용해 같은 객체 범주에 속하는 픽셀을 강조하며, 다중 스케일 컨텍스트를 위한 피라미드 확장을 보강합니다.

ABSTRACT

In this paper, we address the semantic segmentation task with a new context aggregation scheme named \emph{object context}, which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. Considering that the dense relation matrix estimation requires quadratic computation overhead and memory consumption w.r.t. the input size, we propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}. We empirically show the advantages of our approach with competitive performances on five challenging benchmarks including: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff

연구 동기 및 목표

픽셀 레이블링의 향상을 도모하기 위해 객체 수준 정보를 명시적으로 강조한다.
전통적인 다중 스케일 컨텍스트를 객체 지향 컨텍스트로 대체하기 위한 객체 컨텍스트 스키마를 제안한다.
계산량을 줄이면서 조밀한 픽셀 관계를 근사하기 위한 효율적인 interlaced sparse self-attention (ISA)를 개발한다.
다중 스케일 정보를 포착하기 위해 Pyramid-OC 및 ASP-OC와 함께 객체 컨텍스트를 피라미드 스킴에 통합한다.
주요 세분화 벤치마크에서 경쟁력 있는 성능을 입증한다.

제안 방법

객체 컨텍스트를 주어진 픽셀과 동일한 객체 범주를 공유하는 픽셀의 집합으로 정의한다.
이진 객체-컨텍스트 관계를 학습 가능한 조밀한 관계 행렬이나 두 개의 희소 관계 행렬로 대체한다.
조밀한 관계를 Wg와 Wl이라는 두 개의 희소 행렬로 인수분해하여 글로벌 및 로컬 컨텍스트를 위한 interlaced sparse self-attention (ISA)를 도입하고, O(N^2) 복잡도를 감소시킨다.
self-attention과 ISA를 통해 조밀/희소 관계를 구현하며, 식 W = Wl^T Pg^T Wg P(효율적 근사)를 포함한다.
피라미드 풀링 및 ASPP 프레임워크에 객체-컨텍스트 풀링을 통합하여 OCNet을 Pyramid-OC 및 ASP-OC로 확장한다.

실험 결과

연구 질문

RQ1도전적인 데이터셋에서 전통적인 다중 스케일 컨텍스트 방법(예: PPM, ASPP)과 비교하여 객체 중심 컨텍스트 메커니즘이 픽셀 단위 분할 정확도를 향상시킬 수 있는가?
RQ2제안된 interlaced sparse self-attention이 고해상도 특징 맵에서 표준 self-attention에 비해 우호적인 정확도-계산량 트레이드오프를 제공하는가?
RQ3피라미드 확장(Pyramid-OC, ASP-OC)이 객체 컨텍스트를 다중 스케일 컨텍스트와 결합하여 추가 이점을 제공하는가?

주요 결과

객체 컨텍스트 스키마는 일관되게 객체 픽셀을 강조하며, 같은 범주의 픽셀 쌍에서 조밀한 관계 값이 더 높다.
엮인 희소 자기 주의는 전체 자기 주의와 비교해 메모리와 FLOPs를 대폭 줄이면서도 경쟁력 있는 성능을 유지한다.
OCNet 변형들(Base-OC, Pyramid-OC, ASP-OC)은 Cityscapes, ADE20K, LIP, PASCAL-Context, COCO-Stuff에서 경쟁력 있는 결과를 달성한다.
ASPP의 이미지 레벨 풀링을 객체 컨텍스트 풀링으로 대체한 ASP-OC가 표준 ASPP보다 향상을 보인다.
Pyramid-OC는 다중 공간 분할에 걸쳐 객체 컨텍스트를 통합하여 다중 스케일 컨텍스트 활용을 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.