QUICK REVIEW

[论文解读] Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions

Vijay Kumar B G, Gustavo Carneiro|arXiv (Cornell University)|Dec 31, 2015

Advanced Image and Video Retrieval Techniques参考文献 28被引用 176

一句话总结

本文提出使用三元组和孪生卷积网络，并引入一种新颖的全局损失函数，以提升局部图像描述子的泛化能力和性能。该方法在UBC基准测试中取得了最先进（SOTA）的结果，其中同时使用三元组损失和全局损失训练的三元组网络取得了最佳的特征嵌入效果；而采用全局损失的中心-周围孪生网络在FPR95指标上相比先前的成对相似性方法几乎提升了一倍。

ABSTRACT

Recent innovations in training deep convolutional neural network (ConvNet) models have motivated the design of new methods to automatically learn local image descriptors. The latest deep ConvNets proposed for this task consist of a siamese network that is trained by penalising misclassification of pairs of local image patches. Current results from machine learning show that replacing this siamese by a triplet network can improve the classification accuracy in several problems, but this has yet to be demonstrated for local image descriptor learning. Moreover, current siamese and triplet networks have been trained with stochastic gradient descent that computes the gradient from individual pairs or triplets of local image patches, which can make them prone to overfitting. In this paper, we first propose the use of triplet networks for the problem of local image descriptor learning. Furthermore, we also propose the use of a global loss that minimises the overall classification error in the training set, which can improve the generalisation capability of the model. Using the UBC benchmark dataset for comparing local image descriptors, we show that the triplet network produces a more accurate embedding than the siamese network in terms of the UBC dataset errors. Moreover, we also demonstrate that a combination of the triplet and global losses produces the best embedding in the field, using this triplet network. Finally, we also show that the use of the central-surround siamese network trained with the global loss produces the best result of the field on the UBC dataset. Pre-trained models are available online at https://github.com/vijaykbg/deep-patchmatch

研究动机与目标

通过用三元组网络替代标准孪生训练方式，提升学习到的局部图像描述子的泛化能力与鲁棒性。
通过引入一种最小化整个训练集误差的全局损失函数，缓解孪生网络与三元组网络中的过拟合问题。
评估三元组网络与全局损失函数是否在局部描述子学习中优于现有方法。
证明全局损失可提升模型泛化能力，超越基于单个样本对或三元组的优化。

提出的方法

提出一种用于局部图像描述子学习的三元组网络架构，每个训练样本包含一个查询块、一个正样本块（相同3D位置）和一个负样本块（不同3D位置）。
引入一种全局损失函数，通过最小化类内距离的方差并最大化类间距离的方差，提升整体训练集的正则化效果。
将三元组损失与全局损失结合，联合优化局部对比学习与全局分布一致性。
采用中心-周围孪生网络结构并引入全局损失，通过处理中心块及其周围上下文信息，增强特征的判别能力。
使用小批量随机梯度下降进行优化，并通过预训练的孪生模型权重初始化三元组网络，以加快收敛速度。
通过交叉验证对超参数进行调优，包括三元组损失的边界参数（m=0.01），以及全局损失函数中的缩放参数（γ=1, t=0.4, λ=0.8）。

实验结果

研究问题

RQ1与孪生网络相比，三元组网络是否能提升局部图像描述子的学习性能？
RQ2引入全局损失函数是否能减少过拟合并提升描述子学习中的泛化能力？
RQ3三元组损失与全局损失的结合是否能优于单独使用任一损失？
RQ4采用全局损失的中心-周围孪生网络是否优于现有的成对相似性方法？

主要发现

同时使用三元组损失与全局损失训练的三元组网络（TNet-TGLoss）在UBC基准测试中实现了最佳的特征嵌入性能，优于所有先前方法。
采用全局损失训练的中心-周围孪生网络（CS-SNet-GLoss）在FPR95指标上接近先前最先进方法2ch-2stream的半数。
TNet-TGLoss模型在UBC所有六组训练-测试组合中均实现了最低的平均FPR95，证明其具备更优的鲁棒性与泛化能力。
全局损失显著提升了模型泛化能力，表现为更快的收敛速度与更优的性能表现，即使训练轮次更少。
三元组损失与全局损失的联合使用并未超越TNet-TGLoss模型的表现，表明全局损失在与三元组损失结合时效果最佳。
所提出的模型在嵌入学习与成对相似性学习两种设置下均达到最先进水平，且全局损失使性能优于标准的成对训练方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。