[论文解读] No Fuss Distance Metric Learning using Proxies
Proxy-NCA 通过优化可学习代理点的代理空间来学习距离度量,从而实现更快的收敛并在标准数据集上实现零样本的最先进结果。
We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -- an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc. Even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses.
研究动机与目标
- 解决基于三元组的距离度量学习效率低下和收敛性差的问题
- 引入一个小规模、可学习的代理集来表示训练数据
- 提出一个基于代理的排序/NCA 损失,用以上界原始三元组损失
- 证明使用代理进行训练在提高收敛速度的同时保持序关系
- 在零样本学习基准上展示最先进的性能
提出的方法
- 定义代理 P 以近似训练数据,使得每个 x 都有一个尽可能接近的代理 p(y)
- 用代理三元组 (x, p(y), p(z)) 替换三元组 (x, y, z),并优化基于代理的损失
- 证明在范数约束下,基于代理的损失上界经典损失,如 NCA 和三元组损失
- 端到端训练,将代理作为模型参数包含其中,代理共享内存以避免采样复杂性
- 提供静态代理分配(按语义标签)和动态代理分配(最近代理)策略
- 利用类似带有紧凑代理集合的交叉熵的代理-NCA 损失,并提供可选的缩放以界定界限
- 对固定和动态代理分配进行实验,并分析收敛性以及 recall/NMI 指标
实验结果
研究问题
- RQ1一小组可学习的代理能否忠实地对原始三元组/NCA 损失给出上界?
- RQ2基于代理的深度度量学习是否比传统的三元组方法收敛更快?
- RQ3代理是否保持数据点之间的期望序关系?
- RQ4Proxy-NCA 在标准数据集上的零样本检索和聚类表现如何?
- RQ5静态与动态代理分配对性能的影响是多少?
主要发现
- Proxy-NCA 在 Cars196 上的收敛速度大约比基线三元组损失快三倍
- 与之前的最先进方法相比,Proxy-NCA 在 Cars196 的 Recall@1 提升最多 15 个百分点
- 在合理的范数约束下,基于代理的损失为传统损失如 NCA 和三元组损失提供紧致的上界
- Proxy-NCA 在 CUB200、Cars196 和 Stanford Online Products 数据集上实现了零样本最先进的结果
- 使用一小组代理(数百至几千个)在保持内存使用适中的同时取得了强劲的性能
- 当代理与语义标签对齐时,静态代理分配可以达到或超过以往方法;在缺少标签时,动态分配仍然有效
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。