QUICK REVIEW

[论文解读] Uncertainty-aware Self-training for Text Classification with Few Labels

Subhabrata Mukherjee, Ahmed Hassan Awadallah|arXiv (Cornell University)|Jun 27, 2020

Topic Modeling参考文献 40被引用 41

一句话总结

本文提出一种不确定性感知的自训练框架（UST），用于少量标注的文本分类，通过将贝叶斯不确定性引入 MC dropout、基于 BALD 的样本选择以及自信学习来在不增加额外资源的情况下提升伪标签质量。

ABSTRACT

Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-scale unlabeled data for the target task. Standard self-training mechanism randomly samples instances from the unlabeled pool to pseudo-label and augment labeled data. In this work, we propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning. Specifically, we propose (i) acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout, and (ii) learning mechanism leveraging model confidence for self-training. As an application, we focus on text classification on five benchmark datasets. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models fine-tuned on thousands of labeled instances with an aggregate accuracy of 91% and improving by upto 12% over baselines.

研究动机与目标

通过利用未标注数据来降低文本分类的标注瓶颈。
开发一种将贝叶斯不确定性用于引导伪标签的、不确定性感知的自训练框架。
研究基于不确定性的样本选择策略，以降低来自嘈杂伪标签的漂移。
在五个基准文本分类数据集上展示在极少标注示例下的有效性。

提出的方法

在一小部分标注数据集上微调一个预训练语言模型（BERT），以作为教师。
通过对未标注数据进行多次随机前向传播，使用 MC dropout 来获得不确定性估计。
计算基于 BALD 的获取分数，以通过教师的困惑程度对未标注实例进行排序以进行样本选择。
用选定未标注实例中的硬伪标签来扩充训练，并端到端重新训练学生模型。
将预测方差纳入未标注数据损失，以实现自信学习，强调低方差样本。
通过消融实验比较易采样与硬采样、类依赖选择以及自信学习组件。

实验结果

研究问题

RQ1在极少量标注的情况下，不确定性感知采样是否能提升文本分类的自训练？
RQ2在该设置中，基于 BALD 的样本选择是否优于均匀采样或基于回译的增强？
RQ3引入预测方差（自信学习）对伪标签质量和最终准确率的影响？
RQ4类别平衡采样与组件的消融在多次运行中的性能稳定性有何影响？

主要发现

数据集	模型	K 标签	准确率
SST	UST (ours)	30	88.19
IMDB	UST (ours)	30	89.21
Elec	UST (ours)	30	91.27
AG News	UST (ours)	30	87.74
Dbpedia	UST (ours)	30	98.57

在相同编码器（BERT-Base）且每个类别有 30 个标注样本的条件下，UST 的表现优于基线，包括标准自训练和基于回译的 UDA。
在五个数据集上，与基线相比，UST 获得更高的综合准确率并降低方差（论文中报告的相对基准的平均提升）。
带探索的类依赖采样和自信学习在消融实验中提升了鲁棒性和准确性。
基于不确定性的采样（BALD）在易/难策略下的比较显示，在此自训练设置中，易采样通常带来更强的改进。
该方法在每类仅 20-30 个标注样本且大规模未标注样本池的情况下，接近完全监督性能，需标注数较少。
UST 在跨任务的综合准确率上相对于基线模型实现最长 12% 的绝对提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。