QUICK REVIEW

[论文解读] OCNet: Object Context Network for Scene Parsing

Yuhui Yuan, Jingdong Wang|arXiv (Cornell University)|Sep 4, 2018

Advanced Image and Video Retrieval Techniques参考文献 70被引用 516

一句话总结

OCNet 引入一种面向对象的上下文聚合机制用于语义分割，使用密集或交错稀疏自注意力来强调属于同一对象类别的像素，并通过金字塔扩展来获得多尺度上下文。

ABSTRACT

In this paper, we address the semantic segmentation task with a new context aggregation scheme named \emph{object context}, which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. Considering that the dense relation matrix estimation requires quadratic computation overhead and memory consumption w.r.t. the input size, we propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}. We empirically show the advantages of our approach with competitive performances on five challenging benchmarks including: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff

研究动机与目标

通过明确强调对象级信息来促进像素标注的改进。
提出一种对象上下文方案，以对象导向的上下文替代传统的多尺度上下文。
开发一种高效的 interlaced sparse self-attention (ISA)，用以在降低计算量的同时近似密集像素关系。
将对象上下文与金字塔方案（Pyramid-OC 与 ASP-OC）结合，以捕捉多尺度信息。
在主要分割基准上展示具有竞争力的性能。

提出的方法

将对象上下文定义为与给定像素属于同一对象类别的像素集合。
用可学习的密集关系矩阵或两个稀疏关系矩阵替代二值对象上下文关系。
引入将密集关系分解为全局上下文的 Wg 与局部上下文的 Wl 的交错稀疏自注意力（ISA），从而降低 O(N^2) 复杂度。
通过自注意力和 ISA 实例化密集/稀疏关系，包括公式 W = Wl^T Pg^T Wg P（高效近似）。
通过将对象上下文池化整合到金字塔池化和 ASPP 框架中，扩展 OCNet 为 Pyramid-OC 和 ASP-OC。

实验结果

研究问题

RQ1在具有挑战性的数据集上，面向对象的上下文机制是否能相较传统多尺度上下文方法（例如 PPM、ASPP）提升像素级分割精度？
RQ2所提出的交错稀疏自注意力在高分辨率特征图上是否提供相对于标准自注意力的理想精度/计算权衡？
RQ3通过将对象上下文与多尺度上下文结合，金字塔扩展（Pyramid-OC、ASP-OC）是否能带来额外的提升？

主要发现

对象上下文方案始终强调对象像素，对同一类别的像素对，密集关系值更高。
交错稀疏自注意力在显著降低内存和 FLOPs 的同时，保持与完整自注意力相竞争的性能。
OCNet 的变体（Base-OC、Pyramid-OC、ASP-OC）在 Cityscapes、ADE20K、LIP、PASCAL-Context 和 COCO-Stuff 上取得具有竞争力的结果。
在 ASPP 中用对象上下文池化（ASP-OC）替代图像级池化，相较于标准 ASPP 获得改进。
Pyramid-OC 将对象上下文整合到多个空间分区中，提升多尺度上下文的利用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。