QUICK REVIEW

[Paper Review] Multi-source Domain Adaptation for Semantic Segmentation

Sicheng Zhao, Bo Li|arXiv (Cornell University)|Oct 27, 2019

Domain Adaptation and Few-Shot Learning80 citations

TL;DR

MADAN introduces a multi-source unsupervised domain adaptation framework for semantic segmentation that jointly performs pixel-level adaptation, domain aggregation of multiple adapted sources, and feature-level alignment to a target domain, achieving state-of-the-art results on GTA/SYNTHIA to Cityscapes/BDDS benchmarks.

ABSTRACT

Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https://github.com/Luodian/MADAN.

Motivation & Objective

Motivate semantic segmentation under realistic multi-source domain shifts (multiple labeled sources with varied distributions).
Develop an end-to-end framework that combines pixel-level adaptation with semantic and cycle-consistency constraints.
Promote aggregation of multiple adapted domains into a unified domain to improve target-domain performance.
Enhance segmentation performance via feature-level alignment between the aggregated source domain and the real target domain.

Proposed method

For each source S_i, learn a mapping to the target domain T to generate adapted images G_{S_i→T} and a reverse mapping G_{T→S_i} with cycle-consistency losses.
Introduce dynamic semantic consistency (DSC) by aligning the adapted-domain predictions with a dynamically updated segmentation model to preserve semantics.
Use Sub-domain Aggregation Discriminator (SAD) and Cross-domain Cycle Discriminator (CCD) to aggregate multiple adapted domains into a unified domain.
Train a segmentation model F on the aggregated domain X' with a cross-entropy task loss and perform feature-level alignment with a discriminator D_F on the target domain features.
Optimization combines pixel-level GAN losses, cycle-consistency losses, DSC loss, SAD/CCD losses, and feature-level alignment loss into a unified MADAN objective.
Provide three-stage training (initial pixel-level adaptation, dynamic semantic consistency with aggregation, and final feature-aligned segmentation) and iteratively refine.

Experimental results

Research questions

RQ1Can multiple source domains be effectively aggregated to improve unsupervised domain adaptation for semantic segmentation?
RQ2Does pixel-level adaptation complemented by semantic consistency and domain aggregation yield better target-domain performance than traditional single-source or naïve multi-source approaches?
RQ3What is the impact of combining SAD and CCD discriminators with DSC on segmentation accuracy across GTA/SYNTHIA to Cityscapes/BDDS tasks?
RQ4How much does feature-level alignment contribute when applied on top of pixel-level and domain-aggregated adaptation?

Key findings

MADAN outperforms state-of-the-art methods on GTA and SYNTHIA to Cityscapes and BDDS, showing strong gains from multi-source aggregation.
The DSC loss improves over the original SC loss, indicating better preservation of semantics during pixel-level adaptation.
Both SAD and CCD improve performance, with SAD providing more consistent gains across metrics.
Adding feature-level alignment further boosts performance, and the components are largely orthogonal, offering additive improvements.
Empirical ablations demonstrate the effectiveness of combining pixel-level translation, semantic guidance, domain aggregation, and feature alignment.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.