[论文解读] Revisiting Local Descriptor based Image-to-Class Measure for Few-shot Learning
本文提出 DN4,一种基于局部描述符的深度最近邻神经网络,使用分场景训练进行训练,在若干少样本基准上达到当前在少样本任务中的最先进结果。
Few-shot learning in image classification aims to learn a classifier to classify images when only few training examples are available for each class. Recent work has achieved promising classification performance, where an image-level feature based measure is usually used. In this paper, we argue that a measure at such a level may not be effective enough in light of the scarcity of examples in few-shot learning. Instead, we think a local descriptor based image-to-class measure should be taken, inspired by its surprising success in the heydays of local invariant features. Specifically, building upon the recent episodic training mechanism, we propose a Deep Nearest Neighbor Neural Network (DN4 in short) and train it in an end-to-end manner. Its key difference from the literature is the replacement of the image-level feature based measure in the final layer by a local descriptor based image-to-class measure. This measure is conducted online via a $k$-nearest neighbor search over the deep local descriptors of convolutional feature maps. The proposed DN4 not only learns the optimal deep local descriptors for the image-to-class measure, but also utilizes the higher efficiency of such a measure in the case of example scarcity, thanks to the exchangeability of visual patterns across the images in the same class. Our work leads to a simple, effective, and computationally efficient framework for few-shot learning. Experimental study on benchmark datasets consistently shows its superiority over the related state-of-the-art, with the largest absolute improvement of $17\%$ over the next best. The source code can be available from \UrlFont{https://github.com/WenbinLee/DN4.git}.
研究动机与目标
- 通过由图像级别转向基于局部描述符的度量,激发对少样本学习中最终分类的重新思考。
- 利用同一类别内图像之间局部视觉模式的可迁移性和可置换性。
- 提出一个端到端可训练的框架,将深度局部描述符与非参数化的图像到类别度量相结合。
- 在标准的少样本基准上,展示相比最先进的度量学习和元学习方法的经验性提升。
提出的方法
- 用卷积神经网络对图像进行嵌入,以从卷积特征图获取深度局部描述符。
- 通过对每个查询描述符在类描述符池中进行 k 最近邻搜索来构建图像到类别的度量。
- 对所有描述符及其 k-NN 匹配的余弦相似度进行汇总,以获得用于分类的类别分数。
- 在 episodic 训练(C 类 K-shot 任务)中端到端训练嵌入和非参数度量。
- 使用 Conv-64F 作为嵌入模块(可选地使用更深的骨干网络,如 ResNet-256F)。
- 调整超参数 k,并展示在不同设置下的鲁棒性。
实验结果
研究问题
- RQ1基于局部描述符的图像到类别度量在少样本分类中是否优于图像级特征?
- RQ2端到端训练的基于局部描述符的无参数分类器(DN4)是否能够超越标准的度量学习和元学习方法在少样本任务中的表现?
- RQ3超参数(k、骨干网络、过拟合/欠拟合匹配)如何影响 DN4 在不同数据集上的性能?
主要发现
- DN4 在 5-way 1-shot 和 5-way 5-shot 任务上实现的准确率高于若干最先进的度量学习方法(如在 mini ImageNet 上,51.24% vs 49.42% 和 71.02% vs 68.20%)。
- 用深度局部描述符替代图像级特征并使用图像到类别度量能带来显著提升,尤其是在细粒度数据集上。
- 该方法从头训练即可端到端,测试时仍保持非参数,嵌入模块除外。
- 更深的骨干网络(ResNet-256F)进一步提升性能(例如,使用 ResNet-256F 时 5-shot 达到 74.44%)。
- 消融研究表明图像到类别的度量优于图像到图像的变体,且该方法受益于同一类别内局部模式的可交换性。
- DN4 与元学习基线保持竞争力,在 5-shot 设置下常常优于它们。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。