QUICK REVIEW

[Paper Review] Error-Bounded Correction of Noisy Labels

Songzhu Zheng, Pengxiang Wu|arXiv (Cornell University)|Nov 19, 2020

Machine Learning and Data Classification40 citations

TL;DR

The supplementary material validates error bounds for noisy label correction under Tsybakov conditions using synthetic mixture-of-Gaussians data, estimating constants C and lambda, and demonstrating LRT-Correction performance.

ABSTRACT

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy training data) to determine whether a label is trustworthy. However, it remains unknown why this heuristic works well in practice. In this paper, we provide the first theoretical explanation for these methods. We prove that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean. Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction. The corrected labels are consistent with the true Bayesian optimal classifier with high probability. We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.

Motivation & Objective

Motivate and validate the error-bound framework for correcting noisy labels under a multi-class Tsybakov condition.
Provide synthetic experiments where eta, tau, and the noisy-eta are known exactly to verify bounds and correction performance.
Estimate the Tsybakov constants and demonstrate tightness of the error bound and correction bound.
Show empirical validation that the LRT-Correction algorithm closely recovers clean labels under controlled noise patterns.

Proposed method

Construct a synthetic 10-dimensional mixture-of-Gaussians dataset with equal component probability and known Bayes labels.
Compute the true eta(x) and the noisy label distribution using predefined flip probabilities tau01 and tau10.
Estimate Tsybakov constants C and lambda by regressing log p_t against log t for t in [0, 0.9].
Use the perfect noisy classifier f = tilde{eta} to evaluate the upper bounds in Theorem 1 and Corollary 1.
Apply the LRT-Correction algorithm on the synthetic data and compare corrected labels to the clean labels to validate Corollary 1.
Discuss the impact of symmetric and asymmetric noise on correction performance and bound tightness.

Experimental results

Research questions

RQ1Can the Tsybakov condition constants C and lambda be accurately estimated on synthetic data to bound the error of noisy-label correction?
RQ2Does the LRT-Correction algorithm achieve corrected labels that closely match clean labels under controlled symmetric and asymmetric noise regimes?
RQ3How tight are the provided error and correction bounds when eta and f satisfy the assumed conditions?
RQ4What is the impact of using a perfect noisy classifier (f = tilde{eta}) on the observed bounds and correction success rate?
RQ5How do changes in noise structure (symmetric vs asymmetric) affect the probability of correct correction and bound behavior?

Key findings

Estimated Tsybakov constants are C ≈ 0.58 and lambda ≈ 1.27 with high confidence (R^2 ≈ 0.904, p < 1e-4).
The observed bound on the error probability as a function of epsilon aligns with the form C[epsilon]^lambda under the synthetic setup.
The label-correction LRT algorithm, when given f = tilde{eta}, yields corrected labels very close to the clean labels, with performance limited by asymmetry in the noise pattern.
Corollary 1 provides a closed-form correction error bound that matches the empirical evaluation under the synthetic data.
Symmetric and asymmetric noise scenarios are explored, showing the bound remains valid and the correction performance tracks bound predictions under controlled conditions.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.