QUICK REVIEW

[Paper Review] Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

Arash Vahdat|arXiv (Cornell University)|May 31, 2017

Infrastructure Maintenance and Monitoring119 citations

TL;DR

Proposes a semi-supervised CNN-CRF framework to train deep discriminative networks from noisy labels by modeling clean-noisy label relations with latent variables and auxiliary distributions, achieving robustness on image labeling tasks including CIFAR-10 and MS COCO.

ABSTRACT

Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming. This paper proposes a novel framework for training deep convolutional neural networks from noisy labeled datasets that can be obtained cheaply. The problem is formulated using an undirected graphical model that represents the relationship between noisy and clean labels, trained in a semi-supervised setting. In our formulation, the inference over latent clean labels is tractable and is regularized during training using auxiliary sources of information. The proposed model is applied to the image labeling problem and is shown to be effective in labeling unseen images as well as reducing label noise in training on CIFAR-10 and MS COCO datasets.

Motivation & Objective

Address the challenge of training deep CNNs with cheaply collected, noisy labels.
Introduce a conditional random field (CRF) structure to couple clean and noisy labels with latent variables.
Provide a semi-supervised objective that leverages auxiliary information to regularize learning.
Demonstrate robustness and improved labeling on standard image datasets (CIFAR-10 and MS COCO).

Proposed method

Model clean labels as latent variables within a CRF that relates clean and noisy labels conditioned on input x.
Introduce hidden binary variables h to capture correlations among labels while keeping inference tractable.
Define a quadratic energy function with biases from a CNN and pairwise interactions between y and ŷ, regularized by W and W' (CRF-CNN).
Establish a semi-supervised learning objective combining fully labeled clean data and noisy labeled data, optimized via persistent contrastive divergence (EM-like).
Incorporate an auxiliary distribution p_aux to regularize latent inference and guide q(ŷ,h|y,x) toward p_aux, controlled by a hyperparameter α.
Train end-to-end with an alternating E-step (updating q) and M-step (updating θ), with α scheduled to shift reliance from p_aux to p_θ over time.
Utilize a restricted Boltzmann machine (RBM) as the auxiliary model trained on clean data, with parameters fixed during CNN-CRF training.

Experimental results

Research questions

RQ1Can a CNN-CRF framework model the relationship between noisy and clean labels to improve robustness to label noise in deep networks?
RQ2How can auxiliary information be incorporated to regularize latent clean-label inference in semi-supervised training?
RQ3Does the proposed approach improve image labeling performance on datasets with noisy labels compared to baselines?
RQ4How does the method perform with multiclass vs multilabel settings and with different network architectures (e.g., VGG-16, ResNet-50)?

Key findings

The CNN-CRF model provides robustness to label noise by explicitly modeling clean-noisy label relations with latent variables.
Incorporating an auxiliary distribution via p_aux and scheduling α improves latent variable inference and training stability.
The approach yields improved labeling performance over several baselines on the Microsoft COCO dataset with noisy labels and the COCO Flickr-tag setup.
The method is adaptable to both multiclass and multilabel classification and can be integrated as a robust loss layer in existing networks.
Using either VGG-16 or ResNet-50 architectures, the model demonstrates gains over baseline training with noisy labels across evaluated configurations.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.