QUICK REVIEW

[Paper Review] UPSNet: A Unified Panoptic Segmentation Network

Yuwen Xiong, Renjie Liao|arXiv (Cornell University)|Jan 12, 2019

Advanced Neural Network Applications41 references18 citations

TL;DR

UPSNet proposes a unified panoptic segmentation network that jointly predicts semantic and instance segmentation via a shared backbone and two lightweight heads, with a parameter-free panoptic head that resolves conflicts by introducing an unknown class and enabling end-to-end training. It achieves state-of-the-art performance with faster inference on Cityscapes, COCO, and an internal driving dataset.

ABSTRACT

In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. On top of a single backbone residual network, we first design a deformable convolution based semantic segmentation head and a Mask R-CNN style instance segmentation head which solve these two subtasks simultaneously. More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification. It first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolve the conflicts between semantic and instance segmentation. Additionally, it handles the challenge caused by the varying number of instances and permits back propagation to the bottom modules in an end-to-end manner. Extensive experimental results on Cityscapes, COCO and our internal dataset demonstrate that our UPSNet achieves state-of-the-art performance with much faster inference. Code has been made available at: https://github.com/uber-research/UPSNet

Motivation & Objective

To unify semantic and instance segmentation into a single, end-to-end trainable framework for panoptic segmentation.
To address the conflict between semantic and instance segmentation predictions by introducing an unknown class in a parameter-free panoptic head.
To enable backpropagation through the entire network by handling variable numbers of instances per image.
To achieve state-of-the-art performance with faster inference than prior methods.

Proposed method

Uses a single residual backbone network to extract shared features for both semantic and instance segmentation.
Employs a deformable convolution-based semantic segmentation head with feature pyramid networks (FPN) for multi-scale context.
Deploys a Mask R-CNN-style instance segmentation head for bounding box, class, and mask prediction.
Introduces a parameter-free panoptic head that performs pixel-wise classification using logits from both semantic and instance heads, including an extra unknown class channel.
Enables end-to-end training by allowing backpropagation through the panoptic head despite varying numbers of instances.
Applies loss balancing and a novel RoI loss to improve training stability and performance.

Experimental results

Research questions

RQ1Can a unified network architecture effectively combine semantic and instance segmentation for panoptic segmentation with shared representation learning?
RQ2How can conflicts between semantic and instance segmentation predictions be resolved in a differentiable, end-to-end manner?
RQ3What is the impact of introducing an unknown class in the panoptic head on prediction consistency and performance?
RQ4How does the parameter-free panoptic head compare to post-processing or two-stage approaches in terms of accuracy and inference speed?
RQ5To what extent does end-to-end training with backpropagation through the panoptic head improve overall performance?

Key findings

On COCO, UPSNet achieves a PQ score of 46.7 with full training, significantly outperforming prior methods.
The ablation study shows that training the panoptic head improves PQ by 0.5 points compared to post-processing.
Introducing loss balancing increases PQ by 0.1 points, demonstrating its importance for training stability.
Predicting the unknown class with RoI loss boosts PQ^St by 0.5 points, indicating improved handling of ambiguous regions.
Oracle experiments reveal that semantic segmentation is the largest bottleneck, with GT semantic labels yielding a +29.5 PQ gain, highlighting room for improvement in this component.
The model achieves state-of-the-art performance with significantly faster inference than recent competitors across Cityscapes, COCO, and an internal large-scale driving dataset.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.