QUICK REVIEW

[Paper Review] Masking: A New Perspective of Noisy Supervision

Bo Han, Jiangchao Yao|arXiv (Cornell University)|May 21, 2018

Machine Learning and Data Classification43 references102 citations

TL;DR

Introduces Masking, a structure-aware probabilistic model that uses human-provided structure prior to constrain the noise-transition matrix, improving robustness to noisy labels in end-to-end learning.

ABSTRACT

It is important to learn various types of classifiers given training data with noisy labels. Noisy labels, in the most popular noise model hitherto, are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, by estimating this matrix, classifiers can escape from overfitting those noisy labels. However, such estimation is practically difficult, due to either the indirect nature of two-step approaches, or not big enough data to afford end-to-end approaches. In this paper, we propose a human-assisted approach called Masking that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. To this end, we derive a structure-aware probabilistic model incorporating a structure prior, and solve the challenges from structure extraction and structure alignment. Thanks to Masking, we only estimate unmasked noise transition probabilities and the burden of estimation is tremendously reduced. We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classifiers significantly.

Motivation & Objective

Motivate learning from noisy labels by exploiting structure in the noise transition matrix.
Propose a structure-aware probabilistic model (MASKING) that incorporates a prior structure into end-to-end learning.
Reduce estimation burden of noise transitions by focusing on unmasked, plausible transitions.
Demonstrate robustness gains on CIFAR-10/100 with structured noise and Clothing1M with agnostic noise.

Proposed method

Model the noise transition matrix with a latent variable s and a structure variable s_o = f(s).
Instantiate the structure prior P(s_o) and approximate the posterior via a variational distribution Q(s).
Use a tempered sigmoid f(s) to simulate human cognition in extracting structure (diagonal, tri-diagonal, block-diagonal).
Adopt a GAN-like scheme with generator (to produce Q(s)), discriminator (to enforce structure alignment to P(s_o)), and reconstructor (to connect y, x, and noisy label tilde y).
Derive an ELBO-based objective that combines data likelihood with a structure-alignment term (Eq. 1).
Provide an end-to-end training recipe that avoids manual hyperparameter tuning of regularizers.

Experimental results

Research questions

RQ1How can human cognition-inspired structure priors be integrated into learning with noisy labels?
RQ2Does imposing a plausible structure on the noise transition matrix improve estimation and final classifier accuracy under finite data?
RQ3Can a Bayesian/implicit modeling approach (MASKING) outperform traditional two-step or end-to-end noisy-label methods under structured noise?
RQ4How well does MASKING perform with different noise structures (column-diagonal, tri-diagonal, block-diagonal) and with agnostic real-world noise (Clothing1M)?

Key findings

MASKING consistently outperforms forward correction and S-adaptation on benchmark datasets with structured noise.
On CIFAR-10/100, MASKING achieves performance close to the clean-data oracle on several noise structures.
On Clothing1M with agnostic noise, MASKING (71.1%) surpasses NOISY (68.9%), F-correction (69.8%), and S-adaptation (70.3%), approaching CLEAN (75.2%).
Estimations of the noise transition matrices by MASKING align more closely with the true/desired structure than baselines.
The approach demonstrates robust improvement across multiple noise patterns (diagonal, tri-diagonal, block-diagonal) and real-world noisy data.
The framework provides a principled, hyperparameter-light way to incorporate structure priors via a GAN-like structure.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.