[Paper Review] Particular object retrieval with integral max-pooling of CNN activations
This paper proposes a compact CNN-based feature representation using integral max-pooling of convolutional activations to enable efficient object localization and re-ranking in particular object retrieval. By leveraging generalized mean pooling and integral images, the method achieves state-of-the-art performance on Oxford5k and Paris6k, outperforming prior CNN-based approaches and competing with traditional local-feature methods.
Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.
Motivation & Objective
- Address the limitation of CNN-based features in compatibility with geometry-aware re-ranking and query expansion methods.
- Enable efficient localization of particular objects using only a single forward pass of the CNN.
- Develop a unified representation derived from convolutional activations that supports both initial filtering and re-ranking.
- Improve retrieval performance on benchmark datasets like Oxford5k and Paris6k using compact CNN features without relying on local feature matching.
Proposed method
- Introduce a compact image representation by applying integral max-pooling over multiple regions of interest in the feature maps of a pre-trained CNN.
- Extend the concept of integral images to support max-pooling operations on 2D feature maps, enabling fast and differentiable localization of matching regions.
- Use generalized mean pooling (with α=2) to enable the use of integral images with max-pooling, allowing efficient computation of activation-based similarity scores.
- Apply the localized features for re-ranking via a novel query expansion method (AML) that uses the top-activating regions to refine the initial retrieval results.
- Store only the global feature vector and use the integral max-pooling mechanism to dynamically extract region-level features during inference.
- Combine the compact representation with a re-ranking pipeline that leverages the same CNN activations used in the initial filtering stage.
Experimental results
Research questions
- RQ1Can a single CNN feature representation support both initial filtering and geometry-aware re-ranking in particular object retrieval?
- RQ2Can integral max-pooling over convolutional activations enable efficient and accurate object localization without multiple network inferences?
- RQ3Does using generalized mean pooling enable the use of integral images with max-pooling for fast localization in CNN feature maps?
- RQ4Can a CNN-based system with compact features and re-ranking outperform traditional local-feature-based methods on standard benchmarks like Oxford5k and Paris6k?
Key findings
- The proposed R-MAC method with integral max-pooling achieves 77.0% mAP on Oxford5k and 86.5% mAP on Paris6k, outperforming all prior CNN-based methods on both benchmarks.
- The method achieves the highest performance on Paris6k among published CNN-based approaches, surpassing even some local-feature-based systems.
- The AML-based re-ranking method improves mAP by up to 3.9 percentage points on Paris6k when applied to the R-MAC representation.
- Replacing max-pooling with sum-pooling (α=1) in the integral pooling framework leads to lower performance (76.9% mAP on Paris106k), confirming the superiority of max-pooling in this context.
- The system outperforms the cross-matching approach of Razavian et al. (2014b) by 3.0% mAP on Oxford5k, while being significantly more memory- and computation-efficient.
- The method is more efficient than previous CNN-based approaches that require multiple forward passes or store individual region features, due to its single-inference design.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.