QUICK REVIEW

[Paper Review] Feature Generating Networks for Zero-Shot Learning

Yongqin Xian, Tobias Lorenz|arXiv (Cornell University)|Dec 4, 2017

Domain Adaptation and Few-Shot Learning17 citations

TL;DR

This paper proposes f-CLSWGAN, a conditional generative adversarial network that synthesizes deep CNN features for unseen classes using semantic class descriptors, training with a Wasserstein GAN loss and a classification loss to generate discriminative features. The method achieves state-of-the-art performance across five datasets in both zero-shot and generalized zero-shot learning settings by directly generating high-quality features rather than images.

ABSTRACT

Suffering from the extreme training data imbalance between seen and unseen classes, most of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task. To circumvent the need for labeled examples of unseen classes, we propose a novel generative adversarial network (GAN) that synthesizes CNN features conditioned on class-level semantic information, offering a shortcut directly from a semantic descriptor of a class to a class-conditional feature distribution. Our proposed approach, pairing a Wasserstein GAN with a classification loss, is able to generate sufficiently discriminative CNN features to train softmax classifiers or any multimodal embedding method. Our experimental results demonstrate a significant boost in accuracy over the state of the art on five challenging datasets -- CUB, FLO, SUN, AWA and ImageNet -- in both the zero-shot learning and generalized zero-shot learning settings.

Motivation & Objective

To address the extreme data imbalance in zero-shot learning where no training examples exist for unseen classes.
To overcome the limitations of image-based data generation, which often produces low-quality or non-discriminative images unsuitable for training classifiers.
To develop a feature generation framework that enables effective training of softmax classifiers in generalized zero-shot learning by generating class-conditional CNN features.
To establish generalized zero-shot learning as a robust proxy task for evaluating the quality and generalization capability of generative models.

Proposed method

Proposes f-CLSWGAN, a conditional GAN that generates CNN features conditioned on class-level semantic embeddings such as attributes, sentences, or word2vec vectors.
Uses a Wasserstein GAN loss with gradient penalty to stabilize training and enforce the 1-Lipschitz constraint on the discriminator.
Introduces a novel classification loss that regularizes the generator to produce features that are easily separable by a softmax classifier.
Trains the generator to map from a latent noise vector and a semantic descriptor to a class-conditional feature distribution, bypassing image generation.
Employs a deep CNN backbone (e.g., ResNet or GoogleNet) to extract features, enabling the framework to be generalizable across different architectures.
Applies the generated features to train standard classifiers like softmax, demonstrating that feature-level generation outperforms image-level generation.

Experimental results

Research questions

RQ1Can generating CNN features instead of images lead to better performance in zero-shot learning tasks?
RQ2Does combining a Wasserstein GAN with a classification loss improve feature quality and generalization for unseen classes?
RQ3Can a generative model trained on feature space achieve state-of-the-art results in generalized zero-shot learning across diverse datasets?
RQ4Is generalized zero-shot learning a reliable proxy for evaluating the expressive power of generative models?

Key findings

f-CLSWGAN achieves a harmonic mean accuracy of 54.0% on CUB and 65.6% on FLO in the generalized zero-shot learning setting, significantly outperforming both baseline and image-based generation methods.
On the CUB dataset, f-CLSWGAN improves the harmonic mean from 45.1% (no generation) to 54.0% using generated features, while image generation via StackGAN drops performance to 31.9%.
On the FLO dataset, the method improves harmonic mean from 21.9% (no generation) to 65.6% with feature generation, demonstrating consistent gains across datasets.
Image generation with StackGAN leads to performance degradation on CUB due to lack of discriminative detail, while feature generation maintains high-quality, class-consistent representations.
The proposed method enables the use of simple softmax classifiers in generalized zero-shot learning, a setting previously inaccessible to such models due to domain shift and lack of unseen class examples.
The results support the use of generalized zero-shot learning as a reliable, quantitative benchmark for evaluating generative models, complementing manual image inspection.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.