Skip to main content
QUICK REVIEW

[Paper Review] AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data

Liheng Zhang, Guo-Jun Qi|arXiv (Cornell University)|Jan 14, 2019
Domain Adaptation and Few-Shot Learning31 references50 citations
TL;DR

The paper introduces Auto-Encoding Transformations (AET), an unsupervised representation learning paradigm that predicts image transformations from encoded features, achieving state-of-the-art results close to supervised methods on CIFAR-10, ImageNet, and Places.

ABSTRACT

The success of deep neural networks often relies on a large amount of labeled examples, which can be difficult to obtain in many real scenarios. To address this challenge, unsupervised methods are strongly preferred for training neural networks without using any labeled data. In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach. Given a randomly sampled transformation, AET seeks to predict it merely from the encoded features as accurately as possible at the output end. The idea is the following: as long as the unsupervised features successfully encode the essential information about the visual structures of original and transformed images, the transformation can be well predicted. We will show that this AET paradigm allows us to instantiate a large variety of transformations, from parameterized, to non-parameterized and GAN-induced ones. Our experiments show that AET greatly improves over existing unsupervised approaches, setting new state-of-the-art performances being greatly closer to the upper bounds by their fully supervised counterparts on CIFAR-10, ImageNet and Places datasets.

Motivation & Objective

  • Motivate unsupervised representation learning when labeled data are scarce.
  • Propose AET to learn features by predicting input transformations rather than reconstructing data.
  • Demonstrate that AET supports a wide variety of transformations and yields strong empirical results.

Proposed method

  • Formulate AET: learn encoder E and transformation decoder D to predict a sampled transformation t from E(x) and E(t(x)).
  • Minimize loss ell(t, t_hat) between the true transformation and its estimate, with t_hat = D(E(x), E(t(x))).
  • Instantiate AET with parameterized transformations (e.g., affine, projective) and GAN-induced or non-parameterized variants.
  • Use two branches sharing weights to encode original and transformed images and concatenate features for decoding the transformation.
  • Train end-to-end with SGD on mini-batches, using back-propagation to update E and D.

Experimental results

Research questions

  • RQ1Can transforming images and then decoding the transformation from learned features yield better unsupervised representations than data reconstruction?
  • RQ2What classes of transformations (parameterized, GAN-induced, non-parameterized) best promote learning informative features?
  • RQ3How does AET perform compared to state-of-the-art unsupervised methods on CIFAR-10, ImageNet, and Places?
  • RQ4Does the predicted transformation loss correlate with supervised classification performance?

Key findings

  • AET-project (projective transformations) on CIFAR-10 achieves 7.82% error with conv classifier, close to fully supervised 7.2%.
  • AET methods outperform RotNet and other unsupervised baselines on CIFAR-10 across FC and conv classifiers and with KNN evaluation.
  • On ImageNet, AET-project beats several unsupervised methods and narrows the gap to the upper-bound supervised performance (e.g., gap reductions reported for Conv4 and Conv5 settings).
  • AET representations show better alignment between transformation-prediction loss and supervised accuracy, supporting the effectiveness of the AET objective.
  • AET demonstrates strong transfer to Places with competitive results when pretrained on ImageNet and evaluated with linear/logistic classifiers.
  • The experiments indicate that a wide variety of transformations can be incorporated, with parameterized transformations offering straightforward, fair comparisons.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.