[Paper Review] Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach
PyCDA introduces a self-motivated pyramid curriculum for unsupervised domain adaptation in semantic segmentation, combining self-training and curriculum concepts to outperform adversarial methods without extra discriminators.
We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains. Our approach draws on an insight connecting two existing works: curriculum domain adaptation and self-training. Inspired by the former, PyCDA constructs a pyramid curriculum which contains various properties about the target domain. Those properties are mainly about the desired label distributions over the target domain images, image regions, and pixels. By enforcing the segmentation neural network to observe those properties, we can improve the network's generalization capability to the target domain. Motivated by the self-training, we infer this pyramid of properties by resorting to the semantic segmentation network itself. Unlike prior work, we do not need to maintain any additional models (e.g., logistic regression or discriminator networks) or to solve minmax problems which are often difficult to optimize. We report state-of-the-art results for the adaptation from both GTAV and SYNTHIA to Cityscapes, two popular settings in unsupervised domain adaptation for semantic segmentation.
Motivation & Objective
- Motivate improving cross-domain semantic segmentation when transferring from synthetic to real images.
- Develop a training framework that leverages target-domain properties without extra models.
- Introduce a pyramid curriculum over target-domain image regions and pixels derived from the network itself.
- Eliminate the need for adversarial min-max optimization while maintaining competitive performance.
Proposed method
- Construct a pyramid curriculum for each target image consisting of: full image (top), pixel squares (middle), and pixels (bottom).
- Infer target-domain properties (label distributions) from the segmentation network itself in a self-training fashion.
- Replace costly superpixels with small overlapping 4x4 or 8x8 pixel squares for efficiency.
- Use a cross-entropy loss on target image label distributions and pseudo-labels to update the network, avoiding extra discriminators.
- Combine target image-level distributions with region- and pixel-level pseudo-label supervision in a unified objective (Eq. 5).
- Leverage mean distributions from source images to represent the target image distribution when needed, and apply SGD-based optimization with tuned hyperparameters.
Experimental results
Research questions
- RQ1Can a self-motivated pyramid curriculum, combining target-domain label distributions and pixel-level pseudo-labels, match or surpass adversarial domain adaptation methods?
- RQ2Does replacing traditional superpixels with pixel squares retain performance while reducing computation?
- RQ3How does unifying self-training with curriculum adaptation affect performance on GTAV/Cityscapes and SYNTHIA/Cityscapes transfers?
- RQ4What is the impact of using the pyramid levels (top image, middle squares, bottom pixels) on learning signals and generalization?
- RQ5Can this non-adversarial approach outperform existing CDA or ST baselines in semantic segmentation domain adaptation?
Key findings
- PyCDA achieves state-of-the-art results for unsupervised domain adaptation from GTAV and SYNTHIA to Cityscapes among non-adversarial methods.
- Replacing superpixels with 4x4/8x8 pixel squares yields comparable performance with lower computation.
- Jointly leveraging top-layer image distributions and middle-layer region distributions with bottom-layer pseudo-labels provides superior results over using CDA or ST alone.
- The approach performs well across different backbones and surpasses several competing methods that use adversarial training.
- Qualitative results show improved segmentation on dominant classes (e.g., road, building, vegetation) and better handling of smaller objects in some settings.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.