QUICK REVIEW

[Paper Review] The Devil of Face Recognition is in the Noise

Fei Wang, Liren Chen|arXiv (Cornell University)|Jul 31, 2018

Face recognition and analysis21 references23 citations

TL;DR

This paper investigates the impact of label noise in large-scale face recognition datasets, proposing a cleaned, large-scale IMDb-Face dataset derived from movie posters and screenshots. Using extensive manual cleaning and user studies, the authors show that models trained on clean data achieve significantly higher accuracy—e.g., IMDb-Face enables state-of-the-art performance on LFW, MegaFace, and YTF—demonstrating that data quality is as critical as model architecture in face recognition.

ABSTRACT

The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.

Motivation & Objective

To understand the sources and consequences of label noise in large-scale face recognition datasets like MegaFace and MS-Celeb-1M.
To develop a systematic method for cleaning noisy face recognition datasets and improving annotation accuracy through user studies.
To create a new, large-scale, noise-controlled face recognition dataset (IMDb-Face) for benchmarking and model training.
To evaluate the impact of different noise types—label flips and outliers—on model performance and training efficiency.
To demonstrate that data cleanliness can yield performance gains comparable to architectural innovations in deep learning.

Proposed method

Manually cleaned subsets of MegaFace and MS-Celeb-1M by identifying and correcting mislabeled identities and redundant images.
Constructed IMDb-Face, a new dataset of 1.7 million images from 59,000 celebrities sourced from IMDb movie posters and screenshots, ensuring high visual diversity and reduced noise.
Conducted a comprehensive user study to analyze the relationship between annotation time and labeling accuracy, identifying time as a key factor in reducing errors.
Injected controlled noise into IMDb-Face to simulate real-world label corruption and evaluate model robustness under varying noise levels.
Trained and evaluated face recognition models using standard loss functions (Softmax, Center Loss, A-Softmax) on both original and cleaned datasets to compare performance.
Used benchmark datasets (LFW, MegaFace, YTF) under standard protocols to evaluate model generalization and state-of-the-art performance.

Experimental results

Research questions

RQ1How does label noise in large-scale face recognition datasets like MegaFace and MS-Celeb-1M affect model accuracy and training efficiency?
RQ2What is the relationship between different types of noise—label flips and outliers—and the resulting performance degradation in face recognition models?
RQ3How does data source (e.g., search engines vs. curated media like IMDb) influence the inherent noise level and quality of face recognition datasets?
RQ4What annotation strategies maximize labeling accuracy while balancing cost and time?
RQ5To what extent can data cleanliness alone improve model performance, even without architectural or loss function innovations?

Key findings

A model trained on only 32% of the cleaned MegaFace subset achieved performance comparable to a model trained on the full, noisy dataset.
A model trained on just 20% of the cleaned MS-Celeb-1M subset matched the accuracy of a model trained on the full noisy version, indicating that noisy data requires orders of magnitude more samples to achieve equivalent performance.
The IMDb-Face dataset, despite being smaller (1.7M images), achieved a 1.1% higher Rank-1 accuracy on the MegaFace benchmark compared to the full MS-Celeb-1M dataset when using A-Softmax loss.
The state-of-the-art model trained on IMDb-Face achieved a 99.79% EER on LFW, outperforming all published single-model methods, including private ones.
Label accuracy in annotation correlates strongly with time spent per image, suggesting that longer annotation time leads to fewer errors and higher data quality.
Face recognition models are more sensitive to label flips (incorrectly assigned identities) than to outliers (images not belonging to any target identity), with performance degrading nonlinearly as noise increases.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.