QUICK REVIEW

[Paper Review] Dataset Augmentation in Feature Space

Terrance DeVries, Graham W. Taylor|arXiv (Cornell University)|Feb 17, 2017

Domain Adaptation and Few-Shot Learning67 citations

TL;DR

The paper introduces a domain-agnostic data augmentation method that operates in a learned feature space created by a sequence autoencoder. Extrapolating between neighboring context vectors improves performance across multiple domains, including speech, motion, and images.

ABSTRACT

Dataset augmentation, the practice of applying a wide array of domain-specific transformations to synthetically expand a training set, is a standard tool in supervised learning. While effective in tasks such as visual recognition, the set of transformations must be carefully designed, implemented, and tested for every new domain, limiting its re-use and generality. In this paper, we adopt a simpler, domain-agnostic approach to dataset augmentation. We start with existing data points and apply simple transformations such as adding noise, interpolating, or extrapolating between them. Our main insight is to perform the transformation not in input space, but in a learned feature space. A re-kindling of interest in unsupervised representation learning makes this technique timely and more effective. It is a simple proposal, but to-date one that has not been tested empirically. Working in the space of context vectors generated by sequence-to-sequence models, we demonstrate a technique that is effective for both static and sequential data.

Motivation & Objective

Motivate a domain-agnostic augmentation approach that avoids hand-crafted, domain-specific transformations.
Leverage unsupervised representation learning to create a feature space where simple transformations yield realistic synthetic data.
Evaluate extrapolation, interpolation, and noise-based augmentation across diverse datasets.
Demonstrate that feature-space augmentation can approach or surpass state-of-the-art results on several tasks.

Proposed method

Train a sequence autoencoder (two-layer stacked LSTMs) to learn a context-vector feature space from unlabeled data.
Augment data by transforming context vectors (noise, interpolation, extrapolation) before decoding or feeding to a classifier.
Condition the decoder on the context vector at each time step for better reconstructions.
For each sample, find K nearest in-class neighbours in feature space and generate synthetic samples via interpolation or extrapolation.
Use decoded context vectors to reconstruct sequences when training sequence classifiers, or use them directly as features for static classifiers.
Evaluate augmentation in both time-series and image domains, including MNIST, CIFAR-10, AUSLAN, Arabic digits, UCF Kinect, and UJI Pen Characters.

Experimental results

Research questions

RQ1Does augmentation in feature space improve supervised learning performance across diverse domains?
RQ2Among noise, interpolation, and extrapolation in feature space, which transformations most effectively improve generalization?
RQ3Can extrapolation in feature space provide benefits beyond traditional input-space augmentation, and is it complementary to domain-specific techniques?

Key findings

Extrapolating between context vectors significantly improves performance across several datasets (e.g., Arabic Digits: baseline 1.36% error to 0.74% with nearest-neighbour extrapolation).
Random noise can modestly improve performance on some tasks, but interpolation often harms results unless carefully targeted.
Interpolation between neighbours tends to produce smoother transitions, while extrapolation increases variability and often boosts accuracy on complex decision boundaries.
On MNIST, feature-space extrapolation reduced error to 0.95% versus 1.093% baseline and outperformed input-space affine transformations in some settings.
On CIFAR-10, feature-space extrapolation reduced error to 29.24% versus 30.65% baseline, indicating complementary gains when combined with input-space augmentation.
Across AUSLAN and UCF Kinect, extrapolation in feature space yielded notable improvements over baselines, sometimes approaching or surpassing domain-specific results.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.