QUICK REVIEW

[Paper Review] MultiGrain: a unified image embedding for classes and instances

Maxim Berman, Hervé Jeǵou|arXiv (Cornell University)|Feb 14, 2019

Domain Adaptation and Few-Shot Learning52 references34 citations

TL;DR

MultiGrain learns a single image embedding trained with both classification and instance retrieval objectives, enabling strong performance for image classification and instance/copy retrieval, with test-time resolution and pooling adaptations.

ABSTRACT

MultiGrain is a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It builds on a standard classification trunk. The top of the network produces an embedding containing coarse and fine-grained information, so that images can be recognized based on the object class, particular object, or if they are distorted copies. Our joint training is simple: we minimize a cross-entropy loss for classification and a ranking loss that determines if two images are identical up to data augmentation, with no need for additional labels. A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution. When fed to a linear classifier, the learned embeddings provide state-of-the-art classification accuracy. For instance, we obtain 79.4% top-1 accuracy with a ResNet-50 learned on Imagenet, which is a +1.8% absolute improvement over the AutoAugment method. When compared with the cosine similarity, the same embeddings perform on par with the state-of-the-art for image retrieval at moderate resolutions.

Motivation & Objective

Develop a single image embedding that supports class-level classification and instance-level retrieval.
Show that joint classification and instance retrieval training improves classification accuracy.
Introduce a pooling mechanism that leverages high-resolution inputs to boost both classification and retrieval.
Demonstrate effective training strategies, including repeated augmentations and a flexible test-time resolution/pooling setup.

Proposed method

Start from a standard classification trunk (ResNet-50).
Add a GeM pooling layer to produce a fixed-size embedding with a controllable exponent p.
Jointly train with a classification cross-entropy loss and a retrieval margin/contrastive loss.
Use a batch sampling strategy with repeated augmentations (RA) to strengthen the retrieval signal.
Apply PCA whitening post-training to support retrieval, while preserving classification performance.
Allow test-time adaptation by varying input resolution and the GeM exponent p* to balance classification and retrieval.

Experimental results

Research questions

RQ1Can a single embedding learned with both classification and instance retrieval losses achieve competitive performance on both tasks?
RQ2How do training choices (batching, pooling exponent, resolution) affect the trade-off between classification accuracy and retrieval quality?
RQ3Does repeated augmentation in batches improve the retrieval signal without harming classification performance?
RQ4How can test-time input resolution and pooling exponent be tuned to maintain strong performance across tasks?

Key findings

ResNet-50 with MultiGrain reaches 78.6% top-1 on ImageNet at resolution 500 with p=3 and lambda=0.5, outperforming the baseline and approaching state-of-the-art for this setup.
Jointly trained embedding improves classification accuracy compared to a single-task baseline (e.g., 76.2% baseline to 76.9–78.6% under various settings).
Repeated augmentations in batches (RA) yield a measurable improvement in classification accuracy (+0.6% for p=1).
GeM pooling with p=3 provides better localization and boosts retrieval and classification when trained with high-resolution adaptation.
Test-time adjustment of the pooling exponent p* allows leveraging larger input resolutions (e.g., 500, 800) with gains in both tasks, though very large scales may reduce gains.
PCA whitening helps generalization to retrieval datasets while preserving the ability to use embeddings for classification.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.