[论文解读] Leveraging the Invariant Side of Generative Zero-Shot Learning
LisGAN 直接使用由语义描述和不变灵魂样本引导的条件 Wasserstein GAN 生成未见视觉特征,然后使用级联分类器进行零样本识别,在多个基准测试中达到最新的最先进结果。
Conventional zero-shot learning (ZSL) methods generally learn an embedding, e.g., visual-semantic mapping, to handle the unseen visual samples via an indirect manner. In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. Specifically, we train a conditional Wasserstein GANs in which the generator synthesizes fake unseen features from noises and the discriminator distinguishes the fake from real via a minimax game. Considering that one semantic description can correspond to various synthesized visual samples, and the semantic description, figuratively, is the soul of the generated features, we introduce soul samples as the invariant side of generative zero-shot learning in this paper. A soul sample is the meta-representation of one class. It visualizes the most semantically-meaningful aspects of each sample in the same category. We regularize that each generated sample (the varying side of generative ZSL) should be close to at least one soul sample (the invariant side) which has the same class label with it. At the zero-shot recognition stage, we propose to use two classifiers, which are deployed in a cascade way, to achieve a coarse-to-fine result. Experiments on five popular benchmarks verify that our proposed approach can outperform state-of-the-art methods with significant improvements.
研究动机与目标
- Motivate zero-shot learning to recognize unseen classes using semantic descriptions without real unseen samples.
- Develop a generative framework that ensures both diversity and reliability of synthesized unseen features.
- Introduce soul samples as invariant representations to regularize generated features.
- Address multi-view domain shifts with multiple soul samples per class.
- Enhance recognition with a coarse-to-fine cascade classifier on generated features.
提出的方法
- Train a conditional Wasserstein GAN to synthesize unseen features from random noise conditioned on semantic descriptions.
- Introduce soul samples as invariant class representations, with multiple soul samples per class to capture multi-view characteristics.
- Define two regularizers (L_R1 and L_R2) to ensure generated samples and soul samples align with real class representations.
- Use a two-branch GAN objective: Wasserstein loss plus supervised classification loss on both real and generated features.
- Convert zero-shot recognition into supervised learning on synthesized features, with a cascade classifier using entropy-based confidence to refine predictions.
- Optimize with Lipschitz constraint (β fixed at 10) and tune λ and regularization weights to balance diversity and alignment.
实验结果
研究问题
- RQ1Can conditional GANs generate diverse and discriminative unseen features that align with semantic descriptions?
- RQ2Do soul samples effectively regularize generation to prevent soulless features and mitigate multi-view domain shift?
- RQ3Does a cascade classifier leveraging high-confidence unseen samples improve generalized zero-shot performance?
- RQ4What is the sensitivity of LisGAN to hyperparameters and how does its stability behave during training?
主要发现
- LisGAN achieves best results on four of five zero-shot learning evaluations and state-of-the-art on the remaining dataset.
- On aPaY, LisGAN improves over the previous state-of-the-art by 2.6%.
- On AwA, CUB, and FLO, LisGAN improves by 2.4%, 1.5%, and 2.4% respectively in zero-shot accuracy.
- In generalized zero-shot learning, LisGAN shows harmonic-mean improvements up to 2.8% across datasets, with an average improvement around 2.2%.
- Ablation studies show that soul-sample regularization, multiple soul samples per class, and the cascade classifier jointly contribute to performance gains.]
- table_headers: []
- table_rows: []}
- title публика
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。