QUICK REVIEW

[Paper Review] PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment

Kaixin Wang, Jun Hao Liew|arXiv (Cornell University)|Aug 18, 2019

Domain Adaptation and Few-Shot Learning28 references132 citations

TL;DR

PANet uses a non-parametric prototype-based metric learning approach for few-shot segmentation and introduces a prototype alignment regularization to align support and query prototypes, achieving state-of-the-art results on PASCAL-5i and MS COCO.

ABSTRACT

Despite the great progress made by deep CNNs in image semantic segmentation, they typically require a large number of densely-annotated images for training and are difficult to generalize to unseen object categories. Few-shot segmentation has thus been developed to learn to perform segmentation from only a few annotated examples. In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set. Our PANet learns class-specific prototype representations from a few support images within an embedding space and then performs segmentation over the query images through matching each pixel to the learned prototypes. With non-parametric metric learning, PANet offers high-quality prototypes that are representative for each semantic class and meanwhile discriminative for different classes. Moreover, PANet introduces a prototype alignment regularization between support and query. With this, PANet fully exploits knowledge from the support and provides better generalization on few-shot segmentation. Significantly, our model achieves the mIoU score of 48.1% and 55.7% on PASCAL-5i for 1-shot and 5-shot settings respectively, surpassing the state-of-the-art method by 1.8% and 8.6%.

Motivation & Objective

Develop a few-shot segmentation framework based on class-specific prototypes learned from support images.
Improve generalization by separating prototype extraction from non-parametric metric learning.
Leverage a prototype alignment regularization to align prototypes from support and query during training.
Demonstrate robustness to weaker annotations (scribbles, bounding boxes) for the support set.

Proposed method

Embed support and query images with a shared backbone to obtain feature maps.
Compute class prototypes via masked average pooling on the support features for each class and background.
Segment query pixels by nearest prototype in embedding space using cosine distance with a fixed scaling factor.
Apply a prototype alignment regularization by predicting query-based masks to re-segment support images and computing a PAR loss.
Train end-to-end with L_seg plus a PAR loss term (L = L_seg + lambda * L_PAR).
Optionally extend to weaker annotations (scribbles, bounding boxes) for the support set.

Experimental results

Research questions

RQ1Can a non-parametric, prototype-based metric learning approach achieve competitive few-shot segmentation without heavy decoder modules?
RQ2Does enforcing alignment between support and query prototypes during training improve generalization to unseen classes?
RQ3How does PANet perform under 1-shot and 5-shot settings on standard benchmarks (PASCAL-5i, MS COCO) and under weaker annotations?

Key findings

PANet achieves 1-shot mean-IoU of 48.1% and 5-shot mean-IoU of 55.7% on PASCAL-5i, surpassing prior methods.
PANet outperforms state-of-the-art by up to 8.6% in 5-shot mean-IoU on PASCAL-5i.
Prototype Alignment Regularization (PAR) yields faster convergence and tighter alignment between support and query prototypes (lower Euclidean distance between prototypes).
PANet attains top performance on MS COCO with 1-shot and 5-shot settings, outperforming prior methods by notable margins."
PANet remains effective with weak annotations such as scribbles or bounding boxes for the support set.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.