[Paper Review] Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions
Refign is a novel, plug-and-play extension for self-training-based unsupervised domain adaptation (UDA) in semantic segmentation that improves performance under adverse visual conditions by leveraging cross-condition image pairs. It first aligns normal-condition reference predictions to target images using an uncertainty-aware dense matching network (UAWarpC), then refines target predictions via adaptive label correction, achieving state-of-the-art mIoU of 65.6% on ACDC and 56.2% on Dark Zurich without additional training parameters.
Due to the scarcity of dense pixel-level semantic annotations for images recorded in adverse visual conditions, there has been a keen interest in unsupervised domain adaptation (UDA) for the semantic segmentation of such images. UDA adapts models trained on normal conditions to the target adverse-condition domains. Meanwhile, multiple datasets with driving scenes provide corresponding images of the same scenes across multiple conditions, which can serve as a form of weak supervision for domain adaptation. We propose Refign, a generic extension to self-training-based UDA methods which leverages these cross-domain correspondences. Refign consists of two steps: (1) aligning the normal-condition image to the corresponding adverse-condition image using an uncertainty-aware dense matching network, and (2) refining the adverse prediction with the normal prediction using an adaptive label correction mechanism. We design custom modules to streamline both steps and set the new state of the art for domain-adaptive semantic segmentation on several adverse-condition benchmarks, including ACDC and Dark Zurich. The approach introduces no extra training parameters, minimal computational overhead -- during training only -- and can be used as a drop-in extension to improve any given self-training-based UDA method. Code is available at https://github.com/brdav/refign.
Motivation & Objective
- To improve semantic segmentation robustness under adverse visual conditions such as fog, rain, or night-time.
- To address the challenge of error propagation in self-training-based UDA due to noisy pseudo-labels.
- To leverage cross-condition image pairs—e.g., daytime and nighttime views of the same scene—as weak supervision to improve domain adaptation.
- To develop a generic, parameter-efficient method that enhances existing UDA frameworks without architectural changes.
Proposed method
- Proposes UAWarpC, a probabilistic extension of the WarpC geometric matching network, to estimate dense correspondence maps between reference and target images with uncertainty-aware confidence scores.
- Uses the predicted uncertainty to guide spatial alignment of reference predictions to the target image, enabling robust feature warping despite occlusions and dynamic objects.
- Applies a non-parametric, adaptive label correction mechanism that fuses target and warped reference predictions using a confidence-weighted mixing strategy based on class-wise probabilities and uncertainty.
- Introduces a trust score mechanism that dynamically adjusts the influence of reference predictions based on their reliability, especially for challenging conditions like night or snow.
- Employs a two-stage refinement process: first aligning reference predictions to the target via warped features, then refining the target prediction using a confidence-adaptive fusion scheme.
- Designs the method to be a drop-in extension to any self-training-based UDA method, adding minimal computational overhead during training.
Experimental results
Research questions
- RQ1Can cross-condition image pairs (e.g., daytime and nighttime) improve semantic segmentation in adverse visual conditions when used as weak supervision?
- RQ2How can uncertainty-aware dense matching improve the alignment of reference and target predictions in domain adaptation?
- RQ3To what extent can adaptive label correction using aligned reference predictions mitigate error propagation in self-training-based UDA?
- RQ4Does the method generalize across diverse adverse conditions such as fog, snow, and night, even when the reference and target images differ in viewpoint or content?
- RQ5Can the proposed method be used as a plug-in extension to existing UDA frameworks without retraining or adding parameters?
Key findings
- Refign achieves a new state-of-the-art mIoU of 65.6% on the ACDC benchmark for semantic segmentation under adverse conditions.
- On the Dark Zurich dataset, Refign achieves 56.2% mIoU, setting a new SOTA performance for normal-to-adverse domain adaptation.
- The method improves baseline DAFormer by 1.0% mIoU and DACS by 6.8% mIoU on ACDC, demonstrating strong generalization.
- The ablation study confirms that both alignment and adaptive refinement are essential, with the confidence-aware mixing scheme improving performance by 1.8% mIoU over naive averaging.
- UAWarpC achieves state-of-the-art performance in geometric matching on MegaDepth, RobotCar, and CMU datasets, with improved accuracy and uncertainty estimation.
- Qualitative results show that Refign effectively corrects common misclassifications—such as sky being predicted as road—by leveraging context from reference images.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.