QUICK REVIEW

[论文解读] ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

Hyuck Lee, Seungjae Shin|arXiv (Cornell University)|Oct 20, 2021

Imbalanced Data Classification Techniques参考文献 38被引用 41

一句话总结

本文提出 ABC，一种附着在骨干 SSL 模型上的单层辅助平衡分类器，用以缓解半监督学习中的类别不平衡，通过端到端训练，结合一致性正则化和类别平衡损失。在多个类别不平衡的 SSL 基准测试中取得了最先进的结果。

ABSTRACT

Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.

研究动机与目标

动机：现实世界的许多数据集存在类别不平衡，且 SSL 方法常偏向多数类别。
目标：开发一种可扩展的 CISSL 算法，在利用未标注数据的同时通过辅助分类器实现预测平衡。
贡献：将单层辅助平衡分类器（ABC）附加到骨干 SSL 模型，并通过平衡损失和一致性正则化进行端到端训练。
影响：在多个类别不平衡的 SSL 基准测试中实现最先进的结果，同时开销很小。

提出的方法

将单层 ABC 附加到骨干网络的表示层，以利用高质量的骨干表示学习平衡决策。
通过在每个小批次内对标记数据应用 0/1 掩码，以使用类别平衡损失对 ABC 进行训练，从而在不牺牲骨干表示的情况下实现平衡监督。
在 ABC 训练期间使用基于伯努利的掩码 M(x) 对少数类别标记数据进行过采样，保留来自整个小批次的信息。
对未标注数据使用带软伪标签的一致性正则化，以保持 ABC 的跨类别预测平衡，结合对未标注样本的掩码损失。
逐步调整一致性正则化中对未标注数据的掩码，以防止过拟合到少数类别并确保训练稳定。
端到端训练：优化骨干损失、ABC 分类损失和一致性损失的和，同时在新数据上部署 ABC 以进行最终预测。

实验结果

研究问题

RQ1一个附加在骨干 SSL 模型上的辅助单层分类器，能否在类别不平衡的半监督环境中学习到平衡的预测？
RQ2在 ABC 的掩码、类别平衡训练如何与骨干学习的高质量表示相互作用，以降低对多数类别的偏向？
RQ3将一致性正则化与带掩码、类别平衡的 ABC 集成，是否在不牺牲整体准确率的前提下提升少数类的性能？
RQ4骨干网络与 ABC 的端到端训练是否比解耦训练方法在 CISSL 中更有效？
RQ5与在大规模数据集上仅训练骨干相比，添加 ABC 的计算开销是多少？

主要发现

算法	CIFAR-10-LT (gamma=100, beta=20%)	SVHN-LT (gamma=100, beta=20%)	CIFAR-100-LT (gamma=20, beta=40%)
Vanilla	55.3 1.30 / 33.9 1.88	77.0 0.67 / 63.3 1.25	40.1 1.15 / 25.2 0.95
w/ ABC	81.1 0.82 / 72.0 1.77	92.0 0.38 / 87.9 0.73	56.3 0.19 / 43.4 0.42
ReMixMatch	73.7 0.39 / 55.9 0.87	89.8 0.42 / 82.8 0.68	54.0 0.29 / 37.1 0.37
w/ ABC (ReMixMatch)	82.4 0.45 / 75.7 1.18	93.9 0.16 / 92.5 0.4	57.6 0.26 / 46.7 0.50

在 CIFAR-10-LT、SVHN-LT 和 CIFAR-100-LT 下，在各种不平衡和标注条件下，所提出的 ABC 方法实现了最先进的性能。
在 CIFAR-10-LT 的 gamma=100、beta=20% 条件下，ABC 实现总体 81.1% 和少数类准确率 72.0%（表 1 的示例行）。
在 SVHN-LT，ABC 达到 92.0% 总体和 87.9% 少数类准确率（表 1 的示例行）。
在 CIFAR-100-LT，ABC 实现 56.3% 总体和 43.4% 少数类准确率（表 1 的示例行）。
消融研究表明，去除 0/1 掩码、一致性正则化或置信阈值 τ 将降低性能并使 ABC 偏向多数类别。
定性分析（t-SNE、混淆矩阵）表明，ABC 利用骨干表示来产生更可分的簇和更平衡的类别分布。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。