QUICK REVIEW

[Paper Review] Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval

Chang Huang, Shenghuo Zhu|arXiv (Cornell University)|Dec 25, 2012

Face recognition and analysis22 references61 citations

TL;DR

This paper proposes a two-step large-scale metric learning method that first selects sparse, effective feature groups to build a block-diagonal metric, then jointly learns a low-rank Mahalanobis metric in the selected subspace. It achieves state-of-the-art performance on face verification (92.58% accuracy on LFW) and efficient face retrieval using 150D vectors, outperforming LMNN and LDA while scaling efficiently to high-dimensional data.

ABSTRACT

Learning Mahanalobis distance metrics in a high- dimensional feature space is very difficult especially when structural sparsity and low rank are enforced to improve com- putational efficiency in testing phase. This paper addresses both aspects by an ensemble metric learning approach that consists of sparse block diagonal metric ensembling and join- t metric learning as two consecutive steps. The former step pursues a highly sparse block diagonal metric by selecting effective feature groups while the latter one further exploits correlations between selected feature groups to obtain an accurate and low rank metric. Our algorithm considers all pairwise or triplet constraints generated from training samples with explicit class labels, and possesses good scala- bility with respect to increasing feature dimensionality and growing data volumes. Its applications to face verification and retrieval outperform existing state-of-the-art methods in accuracy while retaining high efficiency.

Motivation & Objective

To address the challenge of learning accurate, low-rank Mahalanobis distance metrics in high-dimensional, overcomplete feature spaces.
To improve scalability and efficiency in metric learning for large-scale datasets with explicit class labels.
To enable effective face verification and retrieval by learning compact, discriminative representations through supervised metric learning.
To overcome limitations of existing methods like LMNN and LDA in high-dimensional settings and memory-constrained environments.

Proposed method

The method uses a two-step process: first, sparse block diagonal metric ensembling to select effective feature groups and learn weak metrics for each group.
Second, joint metric learning in the selected feature subspace to learn a low-rank, accurate Mahalanobis metric using all pairwise or triplet constraints.
It employs a convex smooth loss function based on an exponential logit surrogate to enable efficient batch optimization.
The algorithm is designed for scalability, handling large-scale data with high feature dimensionality and large training volumes.
It applies trace norm regularization to enforce low-rank structure on the final metric, reducing dimensionality for efficient retrieval.
The method is implemented using batch learning with efficient gradient computation, avoiding the memory and convergence issues of active-set methods.

Experimental results

Research questions

RQ1Can a two-step metric learning approach effectively combine sparse feature selection with joint metric learning to improve accuracy and efficiency in high-dimensional spaces?
RQ2How does the proposed method scale with increasing feature dimensionality and data volume compared to existing methods like LMNN?
RQ3To what extent can joint metric learning outperform LDA and LMNN in face verification and retrieval tasks?
RQ4Can the method achieve state-of-the-art performance on unrestricted LFW without external data or 3D models?
RQ5Does the use of a smooth convex loss function enable faster convergence and better scalability than active-set methods?

Key findings

The proposed method achieved 92.58% mean classification accuracy on the unrestricted LFW benchmark, surpassing the previous record of 91.30%.
Joint metric learning reduced training time significantly compared to LMNN, converging in 45–130 iterations versus 1,000+ for LMNN, even with higher-dimensional features.
The method scaled efficiently to 1,000-dimensional features and 30 target neighbors, while LMNN failed due to memory limits in several cases.
On a 4 million-face database, retrieval using 150D vectors took only 2 seconds on a single server, demonstrating high efficiency.
Joint metric learning outperformed LDA in retrieval accuracy, especially when the projection dimension exceeded 200, where LDA saturated.
The method achieved a mean average precision (mAPQ) of 0.70 in face retrieval on a large-scale dataset, significantly improving over baseline methods.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.