[Paper Review] Delta-encoder: an effective sample synthesis method for few-shot object recognition
Δ-encoder learns non-linear deformations between same-class samples to synthesize plausible new samples for unseen classes, enabling effective few-shot and one-shot object recognition without external data. It achieves state-of-the-art one-shot results and competitive few-shot results on standard benchmarks.
Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case. Upon acceptance code will be made available.
Motivation & Objective
- Motivate and address the challenge of recognizing new categories from very few examples in computer vision.
- Propose a mechanism to synthesize new samples for unseen classes by transferring learned intra-class deformations (deltas) from seen classes.
- Train a Delta-encoder that encodes deformations between same-class pairs and decodes them onto a seed example from a novel class to generate training samples.
- Evaluate the approach on standard few-shot benchmarks and compare with state-of-the-art methods across multiple datasets.
Proposed method
- Use an auto-encoder variant where the encoder outputs a compact delta representation Z between a pair (X, Y) from the same class.
- Train to reconstruct X from Y and Z, forcing dependence on Y to enable meaningful sample synthesis.
- During sampling, collect Z from many same-class pairs, then generate new samples for a novel class by applying D(Z, Y^u) to a single seed Y^u.
- Train a linear classifier on 1024 synthesized samples per unseen class; extend to k-shot by repeating synthesis for each seed.
- Utilize adaptive L1 reconstruction loss with feature-space weighting and a 16-dim Z; backbone features are precomputed (VGG16/ResNet18) with a small MLP encoder/decoder.
Experimental results
Research questions
- RQ1Can a learned delta representation transfer deformations from seen classes to synthesize realistic samples for unseen classes using only a few examples?
- RQ2How does the Delta-encoder perform in one-shot and few-shot settings across standard benchmarks?
- RQ3Does the synthesized data provide non-trivial information beyond simple augmentation of the seed exemplars?
Key findings
- The Δ-encoder achieves strong one-shot performance, outperforming several baselines on multiple datasets.
- In 1-shot/5-shot, Δ-encoder shows competitive or superior accuracy compared to state-of-the-art methods across miniImageNet, CIFAR-100, Caltech-256, and CUB.
- Ablation studies show that including Y as input to the encoder and learning non-linear deltas significantly improves performance over linear offsets or attribute-based methods.
- Increasing the number of synthesized samples up to around 1,024 per unseen class yields performance gains, with convergence indicating meaningful non-trivial data augmentation.
- Using a pre-trained backbone (ImageNet features) further boosts results, with Δ-encoder achieving notable gains over baselines on several datasets.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.