[Paper Review] Leveraging the Feature Distribution in Transfer-based Few-Shot Learning
The paper introduces a two-stage transfer-based few-shot learning method: (1) preprocess backbone features with a power transform to Gaussian-like distributions, and (2) apply a MAP/Sinkhorn-based optimal-transport algorithm (transductive) to refine class centers, achieving state-of-the-art results across multiple datasets and backbones.
Few-shot classification is a challenging problem due to the uncertainty caused by using few labelled samples. In the past few years, many methods have been proposed to solve few-shot classification, among which transfer-based methods have proved to achieve the best performance. Following this vein, in this paper we propose a novel transfer-based method that builds on two successive steps: 1) preprocessing the feature vectors so that they become closer to Gaussian-like distributions, and 2) leveraging this preprocessing using an optimal-transport inspired algorithm (in the case of transductive settings). Using standardized vision benchmarks, we prove the ability of the proposed methodology to achieve state-of-the-art accuracy with various datasets, backbone architectures and few-shot settings.
Motivation & Objective
- Motivate the use of feature distribution preprocessing to align backbone features with Gaussian-like assumptions in few-shot tasks.
- Propose a MAP-like iterative algorithm leveraging Sinkhorn transport for transductive class-center estimation.
- Demonstrate state-of-the-art accuracy across diverse datasets and backbones with a light hyperparameter footprint.
Proposed method
- Apply a power transform to backbone features to reduce skew and normalize variance (PT) followed by unit variance projection.
- Assume Gaussian-like class distributions and perform MAP estimation of class centers via iterative Sinkhorn-based optimal transport.
- Use a soft allocation matrix M* to map unlabelled query features to class centers with entropy regularization, updating centers with an inertia parameter.
- In inductive settings, classify using a nearest-class-mean approach after preprocessing; in transductive settings, apply the MAP/Sinkhorn procedure over the pooled unlabelled set.
- Tune a small set of hyperparameters (beta for skew, lambda for Sinkhorn regularization, alpha for center updates) across datasets; report results after several iterations (n_steps).
Experimental results
Research questions
- RQ1Can feature preprocessing to Gaussian-like distributions improve transfer-based few-shot learning?
- RQ2Does a MAP-with-Sinkhorn transport-based allocation improve labeling of unlabelled query samples in transductive few-shot settings?
- RQ3How do backbone choice and hyperparameters affect performance across standard benchmarks?
- RQ4What is the empirical impact of sample quantity in the transductive setting on accuracy?
- RQ5Is the proposed method robust to class imbalance and cross-domain scenarios?
Key findings
- The proposed PT+MAP method achieves state-of-the-art accuracy on multiple benchmarks (miniImageNet, tieredImageNet, CUB, CIFAR-FS) across several backbones.
- Power transform (PT) significantly reshapes feature distributions toward Gaussian-like shapes, aiding subsequent classification.
- MAP with Sinkhorn-based allocation improves center estimation and label assignment in transductive settings, outperforming baselines and showing notable gains over inductive variants.
- The method demonstrates strong performance with varying s (shots) and q (unlabelled samples), and remains competitive across cross-domain evaluation.
- Hyperparameters show limited sensitivity; peak performance aligns for validation and novel classes, indicating robustness across datasets.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.