QUICK REVIEW

[论文解读] L_DMI: An Information-theoretic Noise-robust Loss Function

Yilun Xu, Peng Cao|arXiv (Cornell University)|Sep 8, 2019

Machine Learning and Data Classification参考文献 43被引用 34

一句话总结

引入 L_DMI，即基于行列式的互信息（DMI）的损失，理论上对实例无关的标签噪声具有鲁棒性，且易于应用于任何分类器；在多个数据集和噪声模式下表现优越。

ABSTRACT

Accurately annotating large scale dataset is notoriously expensive both in time and in money. Although acquiring low-quality-annotated dataset can be much cheaper, it often badly damages the performance of trained models when using such dataset without particular treatment. Various methods have been proposed for learning with noisy labels. However, most methods only handle limited kinds of noise patterns, require auxiliary information or steps (e.g. , knowing or estimating the noise transition matrix), or lack theoretical justification. In this paper, we propose a novel information-theoretic loss function, $\mathcal{L}_{DMI}$, for training deep neural networks robust to label noise. The core of $\mathcal{L}_{DMI}$ is a generalized version of mutual information, termed Determinant based Mutual Information (DMI), which is not only information-monotone but also relatively invariant. \emph{To the best of our knowledge, $\mathcal{L}_{DMI}$ is the first loss function that is provably robust to instance-independent label noise, regardless of noise pattern, and it can be applied to any existing classification neural networks straightforwardly without any auxiliary information}. In addition to theoretical justification, we also empirically show that using $\mathcal{L}_{DMI}$ outperforms all other counterparts in the classification task on both image dataset and natural language dataset include Fashion-MNIST, CIFAR-10, Dogs vs. Cats, MR with a variety of synthesized noise patterns and noise amounts, as well as a real-world dataset Clothing1M. Codes are available at https://github.com/Newbeeer/L_DMI .

研究动机与目标

在不依赖清洁数据或噪声转移信息的情况下，推动在大规模、嘈杂标签条件下的鲁棒学习。
定义并证明一个广义的互信息度量（DMI），以支持对各种噪声模式的鲁棒性。
提出一个实用的损失函数 L_DMI，最小化模型输出与嘈杂标签之间的负 DMI。
提供理论保证，表明 L_DMI 对实例无关的标签噪声具有鲁棒性，并等价于在清洁数据上训练，偏移量为常数。
在具有多样噪声模式的图像与语言数据集上展示 L_DMI 的经验优势。

提出的方法

将 DMI 定义为分类器输出与标签之间联合分布矩阵的行列式。
将 L_DMI 表述为 DMI 的负对数：L_DMI = -log(DMI(h(X), tilde{Y})).
通过批量统计的 O 与 L 矩阵来估计联合分布 Q_{h(X), tilde{Y}}，其中 U = (1/N) O L。
利用 DMI 的相对不变量性质，确保对噪声的鲁棒性，而无需噪声转移信息。
给出理论结果，表明 L_DMI 的损失在噪声下的平移等于一个常数，并保留分类器质量的排序。

实验结果

研究问题

RQ1L_DMI 是否在没有噪声转移矩阵情况下也能证明对实例无关标签噪声的鲁棒性？
RQ2在各种噪声模式与噪声水平下，优化 L_DMI 是否与在清洁标签下优化的性能对齐？
RQ3L_DMI 是否可以在不同体系结构和模态（图像与文本）下应用，而不需要辅助数据？
RQ4在对角占优、对角不占优以及真实世界噪声标签的情况下，L_DMI 与现有鲁棒损失相比如何？
RQ5在带有合成与真实世界噪声标签的标准基准上，L_DMI 的经验增益是多少？

主要发现

L_DMI 在所述假设下对实例无关标签噪声具有理论鲁棒性。
在带噪声数据上使用 L_DMI 训练等价于在清洁数据上训练，损失的一个常数偏移。
经验上，L_DMI 在 Fashion-MNIST、CIFAR-10、Dogs vs. Cats、MR 与 Clothing1M 的多种噪声模式和水平下，优于 CE、FW、GCE 和 LCCN。
L_DMI 在合成噪声模式（包括对角线非支配）和真实世界噪声数据集上仍具有优势。
在 Clothing1M 上，L_DMI 达到已比对方法中的最高准确率。
该方法与架构和数据域无关，已在 ResNet-50、ResNet-34、VGG-16 和 WordCNN 上进行演示。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。