QUICK REVIEW

[论文解读] Local Aggregation for Unsupervised Learning of Visual Embeddings

Chengxu Zhuang, Alex Zhai|arXiv (Cornell University)|Mar 29, 2019

Domain Adaptation and Few-Shot Learning参考文献 73被引用 78

一句话总结

本文提出 Local Aggregation (LA)，一种无监督学习方法，通过动态形成软本地簇来学习视觉嵌入，在 ImageNet、Places 205 和 PASCAL VOC 上实现了最先进的无监督迁移性能。

ABSTRACT

Unsupervised approaches to learning in neural networks are of substantial interest for furthering artificial intelligence, both because they would enable the training of networks without the need for large numbers of expensive annotations, and because they would be better models of the kind of general-purpose learning deployed by humans. However, unsupervised networks have long lagged behind the performance of their supervised counterparts, especially in the domain of large-scale visual recognition. Recent developments in training deep convolutional embeddings to maximize non-parametric instance separation and clustering objectives have shown promise in closing this gap. Here, we describe a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate. This aggregation metric is dynamic, allowing soft clusters of different scales to emerge. We evaluate our procedure on several large-scale visual recognition datasets, achieving state-of-the-art unsupervised transfer learning performance on object recognition in ImageNet, scene recognition in Places 205, and object detection in PASCAL VOC.

研究动机与目标

激发并发展深度视觉表示的无监督学习，使之缩小与监督方法的差距。
利用局部无参数聚合，促使嵌入空间中相近的数据聚簇在一起，而不相似的数据分离。
展示动态的多尺度软聚类结构能够在不同任务和体系结构上提升迁移性能。
证明更深的网络从 LA 中获得更大收益，并在无标签条件下达到有竞争力甚至优越的结果。

提出的方法

通过神经网络将输入非线性嵌入到一个 D 维单位球上，以获得嵌入 v_i。
对每个嵌入，确定两个邻居集合：近邻 C_i 和背景邻居 B_i；C_i 通过对 V 的鲁棒聚类计算，并在多次聚类中进行聚合，而 B_i 使用嵌入上的 k 最近邻。
将局部聚合损失 L(C_i, B_i | θ, x_i) 定义为给定 v_i 是背景邻居时，它接近 C_i 的负对数似然比，基于对余弦相似度的非参数化 softmax，温度为 τ。
对 L 进行带 θ 的 L2 正则化优化，以学习嵌入函数；在训练中使用记忆库高效近似 V。
维护一个记忆库 V̄，用于存储嵌入的滑动平均值，并用它在不重新计算所有特征的情况下稳定邻居识别。
训练以实例识别损失进行热启动，然后切换到局部聚合损失；超参数包括 τ=0.07、D=128、B_i 的 k=4096，以及具有 m 个簇的多聚类 H。

实验结果

研究问题

RQ1在嵌入空间中进行的局部无参数聚合是否能够产生高质量的无监督视觉表征？
RQ2动态的多尺度聚类如何影响学习到的嵌入空间几何以及下游迁移性能？
RQ3相比于较浅的网络，较深的网络是否更能从 LA 目标中获益？
RQ4LA 方法对聚类和邻居定义的选择是否健壮？
RQ5LA 训练的表征是否能够有效迁移到图像分类、场景识别和目标检测任务？

主要发现

LA 在多种体系结构下实现了 ImageNet 和 Places 205 的无监督迁移学习的最先进水平。
经过 LA 训练的 ResNet-50 在 ImageNet 上无标签情况下达到 60.2% 的 top-1 准确率，超过了 AlexNet 的监督训练。
经微调后，LA 提升了 PASCAL VOC 2007 的目标检测性能，达到该任务的无监督迁移的最先进水平。
LA 受益于更深的体系结构，显示从 conv1 到 conv5 层的一致性能提升。
LA 表征对不同视觉任务具有良好的一般化能力，在各数据集上的 KNN 和线性读出迁移结果都表现出色。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。