Skip to main content
QUICK REVIEW

[論文レビュー] Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach

Giorgio Patrini, Alessandro Rozza|arXiv (Cornell University)|Sep 13, 2016
Machine Learning and Data Classification参考文献 38被引用数 114
ひとこと要約

この論文は、クラス依存のラベルノイズに対して深層ネットワークを頑健にする2つのloss-correction手法(backwardとforward)と、必要な遷移行列Tを得るためのノイズ推定器を提案し、さまざまなアーキテクチャとデータセットで有効性を示す。

ABSTRACT

We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures --- stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers --- demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.

研究の動機と目的

  • ノンコストなラベリング手段やクラウドソーシングによってノイズのあるラベルが作られる場合の深層ニューラルネットワークの頑健な訓練を動機づける。
  • クラス依存のラベルノイズを補償する2つのloss correction手法(backwardとforward)を導入し、ノイズ遷移行列Tを使用する。
  • 補正ロスに対する頑健性保証を、クラス条件付きノイズの下で理論的枠組みとして提示する。
  • ground-truthラベルなしでエンドツーエンド学習を可能にするため、ノイズ率推定をマルチクラス設定へ拡張する。
  • 画像とテキストのタスクを含む多様なアーキテクチャとデータ領域で実証的な頑健性を示す。

提案手法

  • Backward correction: define a corrected loss ell^{←} as T^{-1} ell, yielding an unbiased loss estimator under noisy labels when T is non-singular.
  • Forward correction: define a corrected loss ell^{→} by transforming predictions with T^{T} inside a proper composite loss, preserving the minimizer under noisy data.
  • Prove robustness guarantees for both corrections, showing minimizers under noisy data match those under clean data for appropriate losses.
  • Extend noise estimation to multi-class by estimating T from network outputs on unlabeled or weakly labeled samples, enabling end-to-end training.
  • Demonstrate that loss curvature (Hessian) of ReLU networks is invariant to label noise under these corrections, aiding optimization.

実験結果

リサーチクエスチョン

  • RQ1Can loss correction techniques (backward and forward) provide unbiased or robust optimization in the presence of class-dependent label noise for multi-class classification?
  • RQ2How can the noise transition matrix T be estimated in a multi-class setting without ground-truth labels, and how does this estimation affect robustness?
  • RQ3Do the proposed corrections maintain theoretical robustness guarantees across architectures and domains (including CNNs, RNNs, LSTM, and residual networks)?
  • RQ4What is the impact of using ReLU activations on the Hessian under label noise for these corrections?
  • RQ5How do the corrected losses compare to standard cross-entropy and other baselines on datasets with synthetic and real noise (MNIST, CIFAR, IMDB, Clothing1M)?

主な発見

  • Backward correction yields an unbiased estimator of the loss under noisy labels when T is non-singular, preserving the minimizer.
  • Forward correction preserves the minimizer under the clean distribution for proper composite losses, avoiding explicit matrix inversion in practice.
  • The noise transition matrix T can be estimated from network outputs on unlabeled data, enabling end-to-end learning without ground-truth labels.
  • For ReLU networks, the Hessian of the loss is independent of label noise, meaning curvature-based optimization properties are preserved under correction.
  • Empirical results show improved robustness over uncorrected losses across MNIST, CIFAR-10/100, IMDB, and Clothing1M, with forward correction often outperforming backward correction.
  • The approach is architecture- and domain-agnostic, demonstrated on dense nets, CNNs, ResNets, and LSTMs.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。