[Paper Review] Does label smoothing mitigate label noise?
This paper investigates whether label smoothing mitigates label noise in deep learning, demonstrating that despite its apparent equivalence to injecting symmetric noise, label smoothing acts as a regularizer that improves generalization and performance under label noise. It shows label smoothing is competitive with established loss-correction techniques and significantly enhances knowledge distillation when applied to noisy teacher models.
Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --- being equivalent to injecting symmetric noise to the labels --- we show how it relates to a general family of loss-correction techniques from the label noise literature. Building on this connection, we show that label smoothing is competitive with loss-correction under label noise. Further, we show that when distilling models from noisy data, label smoothing of the teacher is beneficial; this is in contrast to recent findings for noise-free problems, and sheds further light on settings where label smoothing is beneficial.
Motivation & Objective
- To investigate whether label smoothing, commonly used for model calibration and generalization, is effective in mitigating label noise.
- To clarify the theoretical relationship between label smoothing and existing loss-correction techniques for label noise.
- To evaluate the impact of label smoothing on knowledge distillation when training data contains label noise.
- To reconcile the apparent contradiction between label smoothing's noise-injection effect and its observed denoising benefits.
Proposed method
- The authors connect label smoothing to a family of loss-correction techniques from the label noise literature, particularly those based on backward correction.
- They analyze label smoothing as a form of L2 regularization, showing it induces shrinkage of model predictions toward uniformity, which reduces overconfidence.
- Empirical evaluation is conducted on CIFAR-10 and CIFAR-100 with controlled label noise, comparing label smoothing against forward correction and standard training.
- Knowledge distillation is applied using a teacher model trained on noisy labels with and without label smoothing, and the student model's performance is evaluated.
- The study uses temperature-based distillation and measures accuracy across varying smoothing levels (α) to assess robustness.
- Theoretical analysis links label smoothing to regularization, explaining its denoising effect through shrinkage of logits.
Experimental results
Research questions
- RQ1Does label smoothing mitigate label noise despite its apparent equivalence to injecting symmetric noise?
- RQ2How is label smoothing related to established loss-correction techniques in the label noise literature?
- RQ3Can label smoothing improve performance in knowledge distillation when the teacher is trained on noisy labels?
- RQ4Why does label smoothing improve generalization under label noise, given its regularization effect?
Key findings
- Label smoothing is competitive with forward correction and other loss-correction techniques in reducing error under label noise on CIFAR-10 and CIFAR-100.
- Label smoothing improves distillation performance when applied to the teacher model trained on noisy labels, outperforming vanilla distillation.
- The benefit of label smoothing in distillation under noise is robust across different smoothing levels α, with higher α values yielding consistent improvements.
- Label smoothing acts as an implicit L2 regularizer, which explains its denoising effect by reducing model overconfidence and shrinking predictions toward uniformity.
- The results contrast with prior findings in noise-free settings, where label smoothing on the teacher harms distillation; here, it is beneficial under label noise.
- The study establishes that label smoothing can be a viable denoising technique, with theoretical and empirical support from its regularization interpretation.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.