[Paper Review] Panoptic Feature Pyramid Networks
Panoptic FPN adds a lightweight semantic segmentation branch to Mask R-CNN with an FPN backbone, enabling a single network to perform both instance and semantic segmentation and their joint panoptic segmentation with competitive accuracy and reduced compute.
The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.
Motivation & Objective
- Aim to unify instance and semantic segmentation within a single network architecture.
- Evaluate a minimal extension of Mask R-CNN with FPN to support dense pixel labeling alongside region-based outputs.
- Assess performance for instance segmentation, semantic segmentation, and panoptic segmentation on COCO and Cityscapes.
- Investigate training dynamics and loss balancing for multi-task learning in a panoptic setup.
Proposed method
- Start from Mask R-CNN with FPN as the backbone.
- Attach a lightweight semantic segmentation branch that merges multi-scale FPN features into a dense per-pixel output.
- Train with a joint loss L = lambda_i * (classification + box + mask) + lambda_s * semantic_loss, tuning lambda_i and lambda_s.
- Semantic branch design up-samples each FPN level to 1/4 scale and sums features from all levels to produce per-pixel class scores.
- Inference includes post-processing to resolve overlaps between instance and semantic predictions consistent with panoptic segmentation requirements.
Experimental results
Research questions
- RQ1Can a single, minimally extended Mask R-CNN with FPN achieve strong performance on both instance and semantic segmentation tasks?
- RQ2Does joint training with a semantic branch improve or at least not harm instance segmentation accuracy and vice versa?
- RQ3How does Panoptic FPN perform on panoptic segmentation compared with two separate networks under similar compute budgets?
- RQ4What is the impact of architectural choices and loss weighting on multi-task training stability and performance?
Key findings
- Panoptic FPN achieves competitive or superior results for both instance and semantic segmentation when trained jointly, with about half the compute compared to two separate networks.
- Semantic segmentation with the lightweight dense-prediction branch on FPN yields competitive mIoU scores on COCO and Cityscapes without dilation-based backbones.
- Joint training with proper loss weighting can improve one task while maintaining or improving the other, enabling effective multi-task learning for stuff and thing segmentation.
- Panoptic segmentation with a single FPN backbone outperforms comparable single-model entries on COCO test-dev and Cityscapes when compared under similar budgets, establishing Panoptic FPN as a strong baseline.
- A simple aggregation (sum) of multi-scale features for the semantic branch is effective and more efficient than concatenation.
- Using a single network for panoptic segmentation can match the accuracy of dual-network approaches while significantly reducing compute; in some cases it outperforms them.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.