QUICK REVIEW

[论文解读] Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Tobias Weyand, André Araujo|arXiv (Cornell University)|Apr 3, 2020

Advanced Image and Video Retrieval Techniques参考文献 61被引用 28

一句话总结

本文介绍了 Google Landmarks Dataset v2 (GLDv2)，这是一个大规模基准数据集，包含超过 500 万张图像和 200,000 个不同的地标实例，旨在在真实条件下挑战实例级识别与图像检索任务。该数据集具有极端长尾类分布、99% 的域外测试查询以及较高的类内可变性，能够实现对模型鲁棒性的严格评估，并支持在独立数据集上的迁移学习性能测试。

ABSTRACT

While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by real world applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at https://github.com/cvdfoundation/google-landmark.

研究动机与目标

为解决真实世界场景中实例级识别与图像检索缺乏大规模、真实基准的问题。
模拟实际挑战，如极端类别不平衡、域外查询以及高类内可变性。
提供一个可扩展、多样化的数据集，数据源来自 Wikimedia Commons，以支持模型的鲁棒性评估与迁移学习。
为大规模细粒度识别与检索任务建立新的标准基准。
实现对非地标查询的误报率评估，这是以往数据集中被忽视但至关重要的挑战。

提出的方法

数据集从 Wikimedia Commons 构建，仅使用 CC0 或公共领域许可的图像作为索引集和查询集，以保护隐私并防止元数据泄露。
地标标签通过专家人工标注获得，标注工作量超过 800 小时，以确保高质量的真实标签。
训练集包含 400 万张带有实例级标签的图像，索引集包含 762,000 张用于检索的图像。
测试集包含 118,000 张查询图像，其中仅 1.1% 属于域内地标，98.9% 为域外，模拟真实世界的视觉搜索条件。
在 GLDv2 上训练的图像嵌入在独立数据集上进行了评估，以展示其迁移学习能力。
所有图像均移除了元数据（如地理标签、URL），以防止数据泄露，仅对训练集提供完整归属信息。

实验结果

研究问题

RQ1在极端长尾类分布下，模型在实例级识别与检索任务上的性能会如何退化？
RQ2模型在现实世界视觉搜索应用中常见的域外查询上，其泛化能力能达到何种程度？
RQ3在 GLDv2 上训练的图像嵌入能否在无关的独立检索基准上实现具有竞争力的性能？
RQ4模型对高类内可变性（包括视角、光照、天气以及图像领域变化，如照片、绘画、历史印刷品）的鲁棒性如何？
RQ5GLDv2 能否作为低数据场景下下游实例识别任务的有效预训练数据集？

主要发现

GLDv2 包含超过 500 万张图像，涵盖 200,000 个不同的地标实例，是迄今为止最大的实例级识别与检索基准。
测试集包含 118,000 个查询，其中仅 1.1%（1,300 个）为域内地标，模拟了 99% 的域外查询率，符合真实场景。
投入超过 800 小时的人工标注工作以构建真实标签，确保了识别与检索任务的高质量标签。
在 GLDv2 上训练的图像嵌入在独立数据集上实现了具有竞争力的检索性能，展现出强大的迁移学习潜力。
数据集包含多种图像类型，如数码照片、胶片印刷品、绘画和建筑素描，增加了领域不变性的挑战。
报告了基于最先进方法的基线结果，并通过公开的 Kaggle 挑战验证了该数据集的基准测试实用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。