QUICK REVIEW

[论文解读] GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs

Yuan Liu, Zehong Shen|arXiv (Cornell University)|Nov 13, 2019

Advanced Image and Video Retrieval Techniques参考文献 35被引用 45

一句话总结

GIFT 引入一个在特征上对变换不变的密集描述符，使用对来自变换图像的特征执行群卷积，获得对密集匹配具有辨别力且可证明不变的描述符，并提升相对位姿估计。

ABSTRACT

Finding local correspondences between images with different viewpoints requires local descriptors that are robust against geometric transformations. An approach for transformation invariance is to integrate out the transformations by pooling the features extracted from transformed versions of an image. However, the feature pooling may sacrifice the distinctiveness of the resulting descriptors. In this paper, we introduce a novel visual descriptor named Group Invariant Feature Transform (GIFT), which is both discriminative and robust to geometric transformations. The key idea is that the features extracted from the transformed versions of an image can be viewed as a function defined on the group of the transformations. Instead of feature pooling, we use group convolutions to exploit underlying structures of the extracted features on the group, resulting in descriptors that are both discriminative and provably invariant to the group of transformations. Extensive experiments show that GIFT outperforms state-of-the-art methods on several benchmark datasets and practically improves the performance of relative pose estimation.

研究动机与目标

动机：需要在跨视点的几何变换下仍然鲁棒的局部描述符。
提出一种描述符，在对一个变换群保持不变的同时仍具辨别力。
开发一个管道，从变换后的图像构建群特征，并通过 group CNNs 进行嵌入。
通过群卷积与双线性池化展示可证明的不变性。
在标准数据集和极端变变化数据集上展示最先进的性能。

提出的方法

用来自群 G 的变换网格（旋转与缩放）对输入图像进行扭曲。
在每个变换后的图像上使用一个普通的 CNN 提取特征，以在每个点上形成对 G 的群特征 f0(g)。
用两个群 CNN（alpha 和 beta）处理 f0 以在保持等变性的同时获得 f_l,alpha 和 f_l,beta（群卷积层）。
对这两个群-CNN 输出应用双线性池化，得到最终的 GIFT 描述符 d；将其归一化到单位长度。
使用带有困难负样本挖掘的三元组损失进行训练，以促进正确匹配。
使用采样的群元素以使计算可行，并采用离散群池化以实现不变性。

实验结果

研究问题

RQ1如何在不牺牲辨别力的前提下，使局部描述符对变换群具有不变性？
RQ2对定义在变换群上的特征进行群卷积能否保持等变性并实现不变的密集描述符？
RQ3在大视点和外观变化下，GIFT 是否提升密集和稀疏匹配以及相对位姿估计？

主要发现

GIFT 为所考虑的变换群提供了具辨别力且可证明不变的描述符，在基准数据集上超过传统描述符和学习型描述符。
对两个 group-CNN 输出的双线性池化提供了鲁棒的不变性和比其他池化方案更丰富的统计信息。
增加群卷积层数量在消融实验中提升了性能；实验中使用的 GIFT-6 显示出强劲的结果。
GIFT 在极端尺度和方向变化下表现出鲁棒性，并在真实数据上微调（GIFT-F）时提升了相对位姿估计。
在 480x360 图像上，对 1024 个兴趣点，使用 GTX 1080 Ti 的实现大约耗时 65.2 ms，显示出实际速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。