Skip to main content
QUICK REVIEW

[Paper Review] Feature Generating Networks for Zero-Shot Learning

Yongqin Xian, Tobias Lorenz|arXiv (Cornell University)|Dec 4, 2017
Domain Adaptation and Few-Shot Learning17 citations
TL;DR

This paper proposes f-CLSWGAN, a conditional generative adversarial network that synthesizes deep CNN features for unseen classes using semantic class descriptors, training with a Wasserstein GAN loss and a classification loss to generate discriminative features. The method achieves state-of-the-art performance across five datasets in both zero-shot and generalized zero-shot learning settings by directly generating high-quality features rather than images.

ABSTRACT

Suffering from the extreme training data imbalance between seen and unseen classes, most of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task. To circumvent the need for labeled examples of unseen classes, we propose a novel generative adversarial network (GAN) that synthesizes CNN features conditioned on class-level semantic information, offering a shortcut directly from a semantic descriptor of a class to a class-conditional feature distribution. Our proposed approach, pairing a Wasserstein GAN with a classification loss, is able to generate sufficiently discriminative CNN features to train softmax classifiers or any multimodal embedding method. Our experimental results demonstrate a significant boost in accuracy over the state of the art on five challenging datasets -- CUB, FLO, SUN, AWA and ImageNet -- in both the zero-shot learning and generalized zero-shot learning settings.

Motivation & Objective

  • To address the extreme data imbalance in zero-shot learning where no training examples exist for unseen classes.
  • To overcome the limitations of image-based data generation, which often produces low-quality or non-discriminative images unsuitable for training classifiers.
  • To develop a feature generation framework that enables effective training of softmax classifiers in generalized zero-shot learning by generating class-conditional CNN features.
  • To establish generalized zero-shot learning as a robust proxy task for evaluating the quality and generalization capability of generative models.

Proposed method

  • Proposes f-CLSWGAN, a conditional GAN that generates CNN features conditioned on class-level semantic embeddings such as attributes, sentences, or word2vec vectors.
  • Uses a Wasserstein GAN loss with gradient penalty to stabilize training and enforce the 1-Lipschitz constraint on the discriminator.
  • Introduces a novel classification loss that regularizes the generator to produce features that are easily separable by a softmax classifier.
  • Trains the generator to map from a latent noise vector and a semantic descriptor to a class-conditional feature distribution, bypassing image generation.
  • Employs a deep CNN backbone (e.g., ResNet or GoogleNet) to extract features, enabling the framework to be generalizable across different architectures.
  • Applies the generated features to train standard classifiers like softmax, demonstrating that feature-level generation outperforms image-level generation.

Experimental results

Research questions

  • RQ1Can generating CNN features instead of images lead to better performance in zero-shot learning tasks?
  • RQ2Does combining a Wasserstein GAN with a classification loss improve feature quality and generalization for unseen classes?
  • RQ3Can a generative model trained on feature space achieve state-of-the-art results in generalized zero-shot learning across diverse datasets?
  • RQ4Is generalized zero-shot learning a reliable proxy for evaluating the expressive power of generative models?

Key findings

  • f-CLSWGAN achieves a harmonic mean accuracy of 54.0% on CUB and 65.6% on FLO in the generalized zero-shot learning setting, significantly outperforming both baseline and image-based generation methods.
  • On the CUB dataset, f-CLSWGAN improves the harmonic mean from 45.1% (no generation) to 54.0% using generated features, while image generation via StackGAN drops performance to 31.9%.
  • On the FLO dataset, the method improves harmonic mean from 21.9% (no generation) to 65.6% with feature generation, demonstrating consistent gains across datasets.
  • Image generation with StackGAN leads to performance degradation on CUB due to lack of discriminative detail, while feature generation maintains high-quality, class-consistent representations.
  • The proposed method enables the use of simple softmax classifiers in generalized zero-shot learning, a setting previously inaccessible to such models due to domain shift and lack of unseen class examples.
  • The results support the use of generalized zero-shot learning as a reliable, quantitative benchmark for evaluating generative models, complementing manual image inspection.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.