[Paper Review] Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation
This paper systematically investigates deep feature extraction for high-resolution remote sensing (HRRS) image retrieval, optimizing factors like network architecture, feature pooling, scale, and dimensionality reduction. By fine-tuning GoogLeNet and applying multi-patch pooling with PCA compression, the method achieves state-of-the-art performance, attaining an ANMRR of 0.285 on UCM, outperforming prior CNN-based approaches with significantly reduced feature dimensionality.
Remote sensing (RS) image retrieval is of great significant for geological information mining. Over the past two decades, a large amount of research on this task has been carried out, which mainly focuses on the following three core issues: feature extraction, similarity metric and relevance feedback. Due to the complexity and multiformity of ground objects in high-resolution remote sensing (HRRS) images, there is still room for improvement in the current retrieval approaches. In this paper, we analyze the three core issues of RS image retrieval and provide a comprehensive review on existing methods. Furthermore, for the goal to advance the state-of-the-art in HRRS image retrieval, we focus on the feature extraction issue and delve how to use powerful deep representations to address this task. We conduct systematic investigation on evaluating correlative factors that may affect the performance of deep features. By optimizing each factor, we acquire remarkable retrieval results on publicly available HRRS datasets. Finally, we explain the experimental phenomenon in detail and draw conclusions according to our analysis. Our work can serve as a guiding role for the research of content-based RS image retrieval.
Motivation & Objective
- To address the limitations of hand-crafted features in high-resolution remote sensing (HRRS) image retrieval due to variability in scale, orientation, and illumination.
- To systematically investigate and optimize key factors affecting deep feature performance in HRRS image retrieval, including network architecture, feature pooling, scale, and dimensionality reduction.
- To achieve state-of-the-art retrieval performance on public HRRS datasets using optimized deep features derived from pre-trained CNNs.
- To provide a comprehensive analysis of influencing factors and guide future research in content-based RS image retrieval.
Proposed method
- Fine-tuned GoogLeNet on HRRS datasets to adapt pre-trained features to remote sensing data.
- Extracted deep features from intermediate layers (e.g., avg_pool, inception(5b)) of the fine-tuned network for improved representation.
- Applied multi-patch pooling by cropping 20 sub-patches (224×224) from each image and aggregating features via max, mean, or hybrid pooling.
- Reduced high-dimensional features (e.g., 204800D from IFK) to 32D using Principal Component Analysis (PCA) for efficient and compact representation.
- Evaluated multiple similarity metrics (Euclidean, Cosine, Manhattan, χ²) on the compressed features to determine optimal distance measure.
- Combined multi-scale input and multi-patch pooling strategies to enhance feature robustness and discriminability.
Experimental results
Research questions
- RQ1How do different CNN architectures and fine-tuning strategies impact deep feature performance in HRRS image retrieval?
- RQ2What is the optimal strategy for feature aggregation—single-scale vs. multi-scale, or single patch vs. multi-patch pooling?
- RQ3How does dimensionality reduction via PCA affect retrieval accuracy and computational efficiency?
- RQ4Which similarity metric (Euclidean, Cosine, Manhattan, χ²) yields the best retrieval performance with deep features?
- RQ5How do different feature extraction strategies compare to existing state-of-the-art methods on public HRRS benchmarks?
Key findings
- Multi-patch pooling with mean pooling achieved the highest MAP on both RS19 (76.80) and UCM (64.56), outperforming single-patch and multi-scale methods.
- The combination of fine-tuned GoogLeNet, multi-patch pooling, and PCA compression to 32D dimensions yielded the best overall performance with an ANMRR of 0.285 on UCM.
- Multi-scale concatenation degraded performance on UCM (MAP dropped to 38.49), indicating that scale diversity can reduce discriminative power in object-oriented datasets.
- The proposed method achieved superior accuracy compared to prior CNN-based methods, including a 1.4% improvement over the recent method [75] in ANMRR.
- PCA compression reduced feature dimension from 1000 to 32 while maintaining high performance, significantly lowering computational cost.
- The χ² distance metric was inapplicable to PCA-compressed features due to non-negative constraints, limiting its use in low-dimensional settings.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.