QUICK REVIEW

[Paper Review] Delta-encoder: an effective sample synthesis method for few-shot object recognition

Eli Schwartz, Leonid Karlinsky|arXiv (Cornell University)|Jun 12, 2018

Domain Adaptation and Few-Shot Learning6 references199 citations

TL;DR

Δ-encoder learns non-linear deformations between same-class samples to synthesize plausible new samples for unseen classes, enabling effective few-shot and one-shot object recognition without external data. It achieves state-of-the-art one-shot results and competitive few-shot results on standard benchmarks.

ABSTRACT

Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case. Upon acceptance code will be made available.

Motivation & Objective

Motivate and address the challenge of recognizing new categories from very few examples in computer vision.
Propose a mechanism to synthesize new samples for unseen classes by transferring learned intra-class deformations (deltas) from seen classes.
Train a Delta-encoder that encodes deformations between same-class pairs and decodes them onto a seed example from a novel class to generate training samples.
Evaluate the approach on standard few-shot benchmarks and compare with state-of-the-art methods across multiple datasets.

Proposed method

Use an auto-encoder variant where the encoder outputs a compact delta representation Z between a pair (X, Y) from the same class.
Train to reconstruct X from Y and Z, forcing dependence on Y to enable meaningful sample synthesis.
During sampling, collect Z from many same-class pairs, then generate new samples for a novel class by applying D(Z, Y^u) to a single seed Y^u.
Train a linear classifier on 1024 synthesized samples per unseen class; extend to k-shot by repeating synthesis for each seed.
Utilize adaptive L1 reconstruction loss with feature-space weighting and a 16-dim Z; backbone features are precomputed (VGG16/ResNet18) with a small MLP encoder/decoder.

Experimental results

Research questions

RQ1Can a learned delta representation transfer deformations from seen classes to synthesize realistic samples for unseen classes using only a few examples?
RQ2How does the Delta-encoder perform in one-shot and few-shot settings across standard benchmarks?
RQ3Does the synthesized data provide non-trivial information beyond simple augmentation of the seed exemplars?

Key findings

The Δ-encoder achieves strong one-shot performance, outperforming several baselines on multiple datasets.
In 1-shot/5-shot, Δ-encoder shows competitive or superior accuracy compared to state-of-the-art methods across miniImageNet, CIFAR-100, Caltech-256, and CUB.
Ablation studies show that including Y as input to the encoder and learning non-linear deltas significantly improves performance over linear offsets or attribute-based methods.
Increasing the number of synthesized samples up to around 1,024 per unseen class yields performance gains, with convergence indicating meaningful non-trivial data augmentation.
Using a pre-trained backbone (ImageNet features) further boosts results, with Δ-encoder achieving notable gains over baselines on several datasets.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.