[Paper Review] Transferring Landmark Annotations for Cross-Dataset Face Alignment
This paper proposes a transductive cascaded regression method to transfer landmark annotations across face alignment datasets with differing annotation protocols, enabling effective fusion of diverse datasets. By leveraging shared semantic landmarks (e.g., eye and mouth corners), the method transfers dense annotations from a source to a target dataset, significantly improving cross-dataset and unseen-domain face alignment performance, achieving 16.6% average improvement over closed-world baselines and 11.4% over naive fusion.
Dataset bias is a well known problem in object recognition domain. This issue, nonetheless, is rarely explored in face alignment research. In this study, we show that dataset plays an integral part of face alignment performance. Specifically, owing to face alignment dataset bias, training on one database and testing on another or unseen domain would lead to poor performance. Creating an unbiased dataset through combining various existing databases, however, is non-trivial as one has to exhaustively re-label the landmarks for standardisation. In this work, we propose a simple and yet effective method to bridge the disparate annotation spaces between databases, making datasets fusion possible. We show extensive results on combining various popular databases (LFW, AFLW, LFPW, HELEN) for improved cross-dataset and unseen data alignment.
Motivation & Objective
- Address dataset bias in face alignment, where models trained on one dataset underperform on others due to distribution and annotation differences.
- Overcome the challenge of merging datasets with incompatible landmark annotation protocols, which traditionally requires exhaustive manual re-labelling.
- Enable the fusion of multiple face alignment datasets (e.g., LFW, AFLW, LFPW, HELEN) by standardizing their annotation spaces automatically.
- Improve model generalization on unseen domains, especially those with occlusions or challenging poses, by leveraging diverse training data.
- Release dense 68- and 194-point annotations on LFW via annotation transfer, enhancing its utility for future research.
Proposed method
- Identify common semantic landmarks (e.g., eye corners, mouth corners, pupil centers) across datasets that have consistent definitions despite differing total landmark counts.
- Use these shared landmarks as alignment anchors to establish a geometric correspondence between source and target datasets via a transductive alignment process.
- Apply a cascaded regression framework (TCR) that jointly optimizes shape regression and annotation transfer, using the source dataset’s dense annotations to guide fitting on the target domain.
- Perform transductive learning by jointly training on source and target data, where the source annotations are transferred to the target domain using the shared landmark constraints.
- Leverage shape-dependent features and iterative refinement in the cascaded regression to improve landmark localization accuracy on the target dataset.
- Use the transferred annotations to enrich sparse target datasets, enabling high-quality dense annotation in the target domain without manual re-labelling.
Experimental results
Research questions
- RQ1Can annotation spaces across diverse face alignment datasets be standardized without manual re-labelling?
- RQ2To what extent can transferring dense annotations from a source dataset improve performance on a target dataset with a different annotation protocol?
- RQ3How does model generalization improve when training data is fused across datasets using the proposed annotation transfer method?
- RQ4Can a model trained on combined datasets (e.g., LFW + AFLW) outperform models trained on individual datasets in cross-dataset and unseen-domain evaluations?
- RQ5Does the proposed method enable effective transfer to challenging, occluded datasets like COFW without using COFW’s own training data?
Key findings
- The proposed Transductive Cascaded Regression (TCR) method achieves a 16.6% average improvement in cross-dataset evaluation over the 'closed-world' baseline (SDM) when training on one dataset and testing on another.
- The method improves average performance by 11.4% compared to naive fusion of training sets, demonstrating the effectiveness of annotation space standardization.
- On the COFW dataset with heavy occlusions, the TCR model trained on LFW and AFLW (without COFW data) outperforms a model trained on COFW itself, indicating superior generalization.
- The TCR method achieves state-of-the-art results on unseen domains, including challenging cases with severe occlusions and non-frontal poses.
- The method successfully transfers dense 68- and 194-point annotations to the LFW dataset, which previously only had 5-point annotations, significantly enriching its annotation quality.
- The relative improvement ranges from 8% to 39% across different source-target combinations, with the highest gains observed when transferring from HELEN to LFW and LFPW.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.