QUICK REVIEW

[論文レビュー] OCNet: Object Context Network for Scene Parsing

Yuhui Yuan, Jingdong Wang|arXiv (Cornell University)|Sep 4, 2018

Advanced Image and Video Retrieval Techniques参考文献 70被引用数 516

ひとこと要約

OCNetはセマンティックセグメンテーションのためのオブジェクト中心の文脈集約機構を導入し、同じオブジェクトカテゴリに属するピクセルを強調する密な自己注意または交錯するスパース自己注意を用い、それをマルチスケールの文脈のためのピラミッド拡張で補強します。

ABSTRACT

In this paper, we address the semantic segmentation task with a new context aggregation scheme named \emph{object context}, which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. Considering that the dense relation matrix estimation requires quadratic computation overhead and memory consumption w.r.t. the input size, we propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}. We empirically show the advantages of our approach with competitive performances on five challenging benchmarks including: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff

研究の動機と目的

オブジェクトレベルの情報を明示的に強調することにより、ピクセルラベリングの改善を動機づける。
従来のマルチスケール文脈をオブジェクト指向の文脈に置換するオブジェクトコンテキスト方案を提案する。
密なピクセル関係を計算量を削減して近似する効率的な交錯スパース自己注意（ISA）を開発する。
オブジェクトコンテキストをピラミッド方式（Pyramid-OC および ASP-OC）と統合し、マルチスケール情報を捉える。
主要なセグメンテーションベンチマークで競争力のある性能を示す。

提案手法

オブジェクトコンテキストを、あるピクセルと同じオブジェクトカテゴリを共有するピクセルの集合として定義する。
バイナリのオブジェクトコンテキスト関係を、学習可能な密な関係行列または2つのスパース関係行列に置換する。
密な関係を、グローバル文脈用のWgとローカル文脈用のWlの2つのスパース行列に因数分解する、交錯スパース自己注意（ISA）を導入し、O(N^2)の計算量を削減する。
自己注意とISAを通じて密/スパース関係を具体化し、式W = Wl^T Pg^T Wg P（効率的近似）を含む。
OCNetをPyramid-OCおよびASP-OCで拡張し、ピラミッドプーリングとASPPフレームワークにオブジェクトコンテキストプーリングを統合する。

実験結果

リサーチクエスチョン

RQ1オブジェクト中心の文脈機構は、 challenging datasetsにおいて従来のマルチスケール文脈（例：PPM、ASPP）と比較してピクセル単位のセグメンテーション精度を改善できるか。
RQ2提案された交錯スパース自己注意は、高解像度特徴マップに対して標準の自己注意と比較して、精度/計算のトレードオフを有利に提供するか。
RQ3Pyramid-OC、ASP-OCといったピラミッド拡張は、オブジェクトコンテキストとマルチスケール文脈を組み合わせることによってさらなる利得をもたらすか。

主な発見

オブジェクトコンテキスト方式は一貫してオブジェクトピクセルを強調し、同カテゴリのピクセル対には密な関係値が高くなる。
交錯スパース自己注意は、密な自己注意と比較してメモリとFLOPsを実質的に削減しつつ、競争力のある性能を維持する。
OCNetの派生形（Base-OC、Pyramid-OC、ASP-OC）はCityscapes、ADE20K、LIP、PASCAL-Context、COCO-Stuffで競争力のある結果を示す。
ASPPでの画像レベルプーリングをオブジェクトコンテキストプーリング（ASP-OC）に置換すると標準のASPPより改善が得られる。
Pyramid-OCは複数の空間分割を跨いでオブジェクトコンテキストを統合し、マルチスケール文脈の利用を強化する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。