Skip to main content
QUICK REVIEW

[Paper Review] Combating noisy labels by agreement: A joint training method with co-regularization

Hongxin Wei, Lei Feng|arXiv (Cornell University)|Mar 5, 2020
Machine Learning and Data Classification44 references67 citations
TL;DR

JoCoR trains two networks with a joint loss that combines supervised learning and co-regularization to maximize agreement, uses small-loss selection, and yields improved robustness to noisy labels across MNIST, CIFAR, and Clothing1M.

ABSTRACT

Deep Learning with noisy labels is a practically challenging problem in weakly supervised learning. The state-of-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels. In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. Specifically, we first use two networks to make predictions on the same mini-batch data and calculate a joint loss with Co-Regularization for each training example. Then we select small-loss examples to update the parameters of both two networks simultaneously. Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization. Extensive experimental results on corrupted data from benchmark datasets including MNIST, CIFAR-10, CIFAR-100 and Clothing1M demonstrate that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels.

Motivation & Objective

  • Motivate robust learning when training labels are noisy in supervised deep learning.
  • Propose a joint training paradigm that reduces divergence between two classifiers via co-regularization.
  • Demonstrate that agreement-based regularization plus small-loss sample selection improves performance on benchmark noisy-label datasets.
  • Show ablations to isolate the impact of co-regularization and joint training.

Proposed method

  • Two networks with different initializations are trained jointly using a single loss function that combines supervised loss and a co-regularization term.
  • Supervised loss is the sum of cross-entropy losses from both networks on the given (possibly noisy) labels.
  • Co-Regularization is implemented as a symmetric KL-divergence (JS-divergence surrogate) between the two networks’ prediction distributions.
  • Small-loss selection is performed on batches by choosing a subset of examples with the smallest joint loss.
  • The ratio of retained small-loss examples R(t) is scheduled over epochs to mitigate overfitting to noisy data.
  • The training follows a pseudo-siamese paradigm where both networks are updated jointly rather than via cross-updates.
  • The approach is evaluated on MNIST, CIFAR-10, CIFAR-100, and Clothing1M with both synthetic and real-world noisy labels.

Experimental results

Research questions

  • RQ1Can agreement-based regularization between two classifiers eliminate the need for disagreement-based updates in noisy-label training?
  • RQ2Does joint training with co-regularization improve robustness to noisy labels compared to existing disagreement-based methods?
  • RQ3How effective is small-loss sample selection when guided by a joint loss that enforces cooperation between networks?
  • RQ4What is the impact of co-regularization versus joint training ablated components on label precision and test accuracy?

Key findings

  • JoCoR achieves higher test accuracy than several state-of-the-art baselines on MNIST, CIFAR-10, CIFAR-100, and Clothing1M under various noise regimes.
  • The method yields higher label precision in mini-batches, indicating more effective selection of clean instances during training.
  • Ablation studies show that both Co-Regularization and Joint Training contribute significantly to performance, with Co-Regularization preventing memorization of noisy labels.
  • Compared to Co-teaching and Co-teaching+, JoCoR maintains or improves performance as noise increases, including hardest symmetric and asymmetric noise settings.
  • The approach demonstrates strong generalization by maintaining robustness across both synthetic noise and real-world noisy labels.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.