[论文解读] Unsupervised Meta-Learning For Few-Shot Image and Video Classification.
本文提出UMTRA,一种无监督元学习框架,通过从无标签数据中生成合成任务,实现少样本图像与视频分类,无需依赖有标签的元训练任务。在Omniglot五分类一次学习任务上,其准确率达到了MAML的85%,同时将所需标注数据从24,005个减少至5个。
Few-shot or one-shot learning of classifiers for images or videos is an important next frontier in computer vision. The extreme paucity of training data means that the learning must start with a significant inductive bias towards the type of task to be learned. One way to acquire this is by meta-learning on tasks similar to the target task. However, if the meta-learning phase requires labeled data for a large number of tasks closely related to the target task, it not only increases the difficulty and cost, but also conceptually limits the approach to variations of well-understood domains. In this paper, we propose UMTRA, an algorithm that performs meta-learning on an unlabeled dataset in an unsupervised fashion, without putting any constraint on the classifier network architecture. The only requirements towards the dataset are: sufficient size, diversity and number of classes, and relevance of the domain to the one in the target task. Exploiting this information, UMTRA generates synthetic training tasks for the meta-learning phase. We evaluate UMTRA on few-shot and one-shot learning on both image and video domains. To the best of our knowledge, we are the first to evaluate meta-learning approaches on UCF-101. On the Omniglot and Mini-Imagenet few-shot learning benchmarks, UMTRA outperforms every tested approach based on unsupervised learning of representations, while alternating for the best performance with the recent CACTUs algorithm. Compared to supervised model-agnostic meta-learning approaches, UMTRA trades off some classification accuracy for a vast decrease in the number of labeled data needed. For instance, on the five-way one-shot classification on the Omniglot, we retain 85% of the accuracy of MAML, a recently proposed supervised meta-learning algorithm, while reducing the number of required labels from 24005 to 5.
研究动机与目标
- 解决在极少量标注数据下进行少样本与一次样本图像与视频分类的挑战。
- 通过在多样化且无标签的数据集上实现无监督元学习,消除对有标签元训练任务的依赖。
- 开发一种与模型无关的元学习方法,可适配任意分类器架构。
- 在UCF-101视频基准上评估元学习性能,该任务在当前背景下为首次尝试。
- 在大幅减少标注数据需求的同时,实现与监督元学习方法相当的性能。
提出的方法
- UMTRA从规模足够大、多样性高且与目标任务领域相关的无标签数据集中生成合成训练任务。
- 利用聚类与数据增强技术,构建类似于少样本学习场景的支持集与查询集。
- 在元训练阶段无需任何标签,对这些合成任务进行元学习。
- 兼容任意分类器网络架构,具备模型无关性。
- 采用对比学习目标,以增强合成任务中支持样本与查询样本之间的特征区分能力。
- 该框架在Omniglot与Mini-ImageNet上评估图像分类性能,在UCF-101上评估视频分类性能。
实验结果
研究问题
- RQ1在无标签数据上进行无监督元学习能否实现与现有方法相当的少样本分类性能?
- RQ2与MAML等监督元学习方法相比,UMTRA在准确率与标注数据效率方面表现如何?
- RQ3UMTRA能否泛化至视频分类任务,如UCF-101上的任务?
- RQ4数据集的多样性与领域相关性对UMTRA性能有何影响?
- RQ5与其它基于无监督表示学习的元学习方法相比,UMTRA的性能如何?
主要发现
- 在Omniglot五分类一次学习基准上,UMTRA实现了MAML(一种监督元学习方法)85%的准确率。
- 对于同一任务,UMTRA将所需标注样本数从24,005个减少至仅5个,标注数据需求降低了99.98%。
- 在Omniglot与Mini-ImageNet上,UMTRA优于所有测试的基于无监督表示学习的元学习方法。
- UMTRA在无监督元学习方法中达到最先进性能,与CACTUs交替取得最佳结果。
- 本工作首次在UCF-101视频基准上评估元学习性能,证明了其在视频分类任务中的适用性。
- 该方法在图像与视频领域均保持强健性能,当无标签数据具有相关性时,对领域分布变化表现出良好鲁棒性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。