QUICK REVIEW

[论文解读] Fine-tuning CNN Image Retrieval with No Human Annotation

Filip Radenović, Giorgos Tolias|arXiv (Cornell University)|Nov 3, 2017

Advanced Image and Video Retrieval Techniques被引用 29

一句话总结

本文提出一种完全无监督的方法，通过仅使用运动结构（SfM）管道生成的3D重建来微调CNN用于图像检索，从而消除了对人工标注数据的需求。通过利用相机几何结构和3D模型结构自动选择难正例与难负例，并引入可学习的广义均值（GeM）池化层及判别性特征白化方法，该方法在VGG网络上于Oxford Buildings、Paris和Holidays基准测试中实现了最先进性能。

ABSTRACT

Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.

研究动机与目标

消除基于CNN的图像检索中对昂贵人工标注训练数据的需求。
通过从3D重建中自动挖掘难正例与难负例，提升检索性能。
开发一种可训练的池化层，以泛化最大池化与平均池化，从而提升特征描述子质量。
提出一种从相同无监督数据中学习的判别性白化方法，进一步提升性能。
在无需人工标注的情况下，于标准基准测试中实现最先进结果。

提出的方法

利用无序图像集合的SfM管道生成的3D重建，自动识别训练样本对。
从同一物体不同视角拍摄的图像中选择难正例，从无关物体中选择难负例。
引入一种可学习的广义均值（GeM）池化层，其参数按每个特征图或全局设置，以泛化最大池化与平均池化。
应用从相同无监督训练数据中学习的判别性白化方法，以提升特征描述子的紧凑性与判别性。
提出一种新型的α加权查询扩展方法，相比标准平均查询扩展更具鲁棒性。
使用对比损失在自动收集的正负样本对上训练网络。

实验结果

研究问题

RQ1是否可以在不使用任何人工标注训练数据的情况下显著提升图像检索性能？
RQ2是否可以有效利用3D重建几何结构来挖掘难训练样本，以提升特征学习质量？
RQ3可学习的GeM池化层是否在检索任务中优于固定的最大池化或平均池化机制？
RQ4从无监督数据中学习的判别性白化方法是否能超越标准PCA白化，进一步提升特征质量？
RQ5所提方法是否能在无需过拟合训练数据的情况下，在多样化基准上实现良好泛化？

主要发现

所提方法在使用GeM池化与微调的VGG-16网络下，于Oxford5k、Paris6k和Holidays基准上分别取得87.9%的mAP，达到最先进水平。
当结合α加权查询扩展时，该方法在Oxford5k上达到91.9%的mAP，在Paris6k上也达到91.9%，优于先前的无监督与有监督基线方法。
可学习的GeM池化层在所有数据集上均优于标准的最大池化与平均池化，mAP稳定提升2-3%。
从相同无监督数据中学习的判别性白化方法相比PCA白化，mAP提升最高达2.5%。
网络泛化能力良好，当在包含Oxford与Paris地标在内的所有3D模型上训练时，mAP平均仅下降0.3%，表明过拟合程度极低。
该方法在Oxford5k与Holidays基准上超越了最先进水平，且在Paris基准上与最佳系统持平，尽管未使用任何人工标注或地标注释。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。