QUICK REVIEW

[论文解读] Localizing and Orienting Street Views Using Overhead Imagery

Nam Vo|arXiv (Cornell University)|Jul 30, 2016

Advanced Image and Video Retrieval Techniques参考文献 29被引用 21

一句话总结

本文提出了一种利用俯视（卫星）图像进行街景图像定位与定向的深度学习框架，引入了一种新型损失函数（DBL）和显式的定向监督，以提升跨视角匹配性能。在涵盖11个美国城市的100万张街景与俯视图像对的新数据集上，其性能比基线孪生网络高出约2.5倍。

ABSTRACT

In this paper we aim to determine the location and orientation of a ground-level query image by matching to a reference database of overhead (e.g. satellite) images. For this task we collect a new dataset with one million pairs of street view and overhead images sampled from eleven U.S. cities. We explore several deep CNN architectures for cross-domain matching -- Classification, Hybrid, Siamese, and Triplet networks. Classification and Hybrid architectures are accurate but slow since they allow only partial feature precomputation. We propose a new loss function which significantly improves the accuracy of Siamese and Triplet embedding networks while maintaining their applicability to large-scale retrieval tasks like image geolocalization. This image matching task is challenging not just because of the dramatic viewpoint difference between ground-level and overhead imagery but because the orientation (i.e. azimuth) of the street views is unknown making correspondence even more difficult. We examine several mechanisms to match in spite of this -- training for rotation invariance, sampling possible rotations at query time, and explicitly predicting relative rotation of ground and overhead images with our deep networks. It turns out that explicit orientation supervision also improves location prediction accuracy. Our best performing architectures are roughly 2.5 times as accurate as the commonly used Siamese network baseline.

研究动机与目标

解决利用俯视卫星图像对地面级街景图像进行定位与定向的挑战。
在存在极端视角差异和未知相机方位角的情况下，提升跨域图像匹配的准确性。
开发一种可扩展的深度学习框架，适用于大规模图像地理定位。
研究旋转不变性与显式方向回归在表示学习中的影响。
发布一个包含100万张街景与俯视图像对的新大规模数据集，以推动该领域的发展。

提出的方法

提出一种新型基于距离的逻辑（DBL）损失层，以改善孪生网络与三元组网络在跨视角匹配中的训练效果。
引入显式方向回归（OR），用于预测地面图像与俯视图像之间的相对旋转，从而同时提升方向预测与定位准确性。
通过在训练过程中对输入进行随机旋转，实现旋转不变性（RI）训练。
在推理阶段采用多方向特征平均（avg16），在不增加完整推理开销的前提下模拟16个旋转裁剪的效果。
在小批量内采用穷举三元组采样（eDBL），以提升训练效率与收敛速度。
在新发布的大型数据集上，对多种架构（分类、混合、孪生、三元组网络）进行训练与评估。

实验结果

研究问题

RQ1能否通过新型损失函数显著提升孪生与三元组网络在跨视角地理定位中的性能？
RQ2在训练过程中引入显式方向回归是否能同时提升方向预测与定位准确性？
RQ3与推理时的数据增强相比，旋转不变性训练在处理未知方位角时表现如何？
RQ4在表示学习中，旋转不变性与判别能力之间存在怎样的最优权衡？
RQ5一个大规模、公开可用的街景与俯视图像对数据集能否加速跨视角地理定位的研究进展？

主要发现

所提出的DBL损失函数显著提升了孪生与三元组网络的准确性，在性能上达到标准孪生基线的约2.5倍。
显式方向回归（OR）在360°旋转不变性网络上相对提升了30%的性能，尽管其对90° RI网络无帮助。
多方向特征平均（avg16）实现了与测试16个旋转裁剪相当的性能，同时显著降低了推理成本。
穷举三元组采样（eDBL）实现了更快收敛，在3万次迭代内达到与标准训练15万次迭代相似的性能。
360° RI + OR + avg16 三元组网络在排名性能上表现最佳，定位准确性与方向预测能力均得到提升（平均误差17°）。
本文发布了包含11个美国城市100万张街景与俯视图像对的新数据集，以支持未来在跨视角地理定位领域的研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。