QUICK REVIEW

[论文解读] Error-Bounded Correction of Noisy Labels

Songzhu Zheng, Pengxiang Wu|arXiv (Cornell University)|Nov 19, 2020

Machine Learning and Data Classification被引用 40

一句话总结

附加材料在 Tsybakov 条件下使用合成高斯混合数据验证对嘈杂标签纠正的误差界，估计常数 C 和 lambda，并演示 LRT-Correction 的性能。

ABSTRACT

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy training data) to determine whether a label is trustworthy. However, it remains unknown why this heuristic works well in practice. In this paper, we provide the first theoretical explanation for these methods. We prove that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean. Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction. The corrected labels are consistent with the true Bayesian optimal classifier with high probability. We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.

研究动机与目标

在多类 Tsybakov 条件下，推动并验证纠正嘈杂标签的误差界框架。
提供 eta、tau 以及嘈杂 eta 精确已知的合成实验，以验证界和纠正性能。
通过对 t 在 [0, 0.9] 区间对 log p_t 回归到 log t，估计 Tsybakov 常数 C 和 lambda。
展示经验性验证，在受控噪声模式下，LRT-Correction 算法能近似恢复干净标签。

提出的方法

构建一个具有相等成分概率和已知贝叶斯标签的合成十维高斯混合数据集。
使用预定义的翻转概率 tau01 与 tau10 计算真实的 eta(x) 和嘈杂标签分布。
通过对 t 在 [0, 0.9] 对 log p_t 回归到 log t，估计 Tsybakov 常数 C 和 lambda。
使用完美嘈杂分类器 f = tilde{eta} 来评估定理 1 和推论 1 的上界。
在合成数据上应用 LRT-Correction 算法，并将修正后的标签与干净标签进行比较以验证推论 1。
讨论对称噪声和非对称噪声对纠正性能和界的紧致性的影响。

实验结果

研究问题

RQ1Can the Tsybakov condition constants C and lambda be accurately estimated on synthetic data to bound the error of noisy-label correction?
RQ2Does the LRT-Correction algorithm achieve corrected labels that closely match clean labels under controlled symmetric and asymmetric noise regimes?
RQ3How tight are the provided error and correction bounds when eta and f satisfy the assumed conditions?
RQ4What is the impact of using a perfect noisy classifier (f = tilde{eta}) on the observed bounds and correction success rate?
RQ5How do changes in noise structure (symmetric vs asymmetric) affect the probability of correct correction and bound behavior?

主要发现

Estimated Tsybakov constants are C ≈ 0.58 and lambda ≈ 1.27 with high confidence (R^2 ≈ 0.904, p < 1e-4).
The observed bound on the error probability as a function of epsilon aligns with the form C[epsilon]^lambda under the synthetic setup.
The label-correction LRT algorithm, when given f = tilde{eta}, yields corrected labels very close to the clean labels, with performance limited by asymmetry in the noise pattern.
Corollary 1 provides a closed-form correction error bound that matches the empirical evaluation under the synthetic data.
Symmetric and asymmetric noise scenarios are explored, showing the bound remains valid and the correction performance tracks bound predictions under controlled conditions.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。