QUICK REVIEW

[论文解读] ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning

Zihan Ye, Shreyank N Gowda|arXiv (Cornell University)|Feb 12, 2026

Domain Adaptation and Few-Shot Learning被引用 0

一句话总结

ZeroDiff++ 引入基于扩散的训练与测试时自适应，以加强零-shot 学习中视觉—语义相关性，解决虚假相关与数据稀缺问题。它增加了扩散增强、动态实例级语义、多视角判别器及互学习、以及用于未见类别的基于扩散的测试时自适应/生成（DiffTTA、DiffGen）。

ABSTRACT

Zero-shot Learning (ZSL) enables classifiers to recognize classes unseen during training, commonly via generative two stage methods: (1) learn visual semantic correlations from seen classes; (2) synthesize unseen class features from semantics to train classifiers. In this paper, we identify spurious visual semantic correlations in existing generative ZSL worsened by scarce seen class samples and introduce two metrics to quantify spuriousness for seen and unseen classes. Furthermore, we point out a more critical bottleneck: existing unadaptive fully noised generators produce features disconnected from real test samples, which also leads to the spurious correlation. To enhance the visual-semantic correlations on both seen and unseen classes, we propose ZeroDiff++, a diffusion-based generative framework. In training, ZeroDiff++ uses (i) diffusion augmentation to produce diverse noised samples, (ii) supervised contrastive (SC) representations for instance level semantics, and (iii) multi view discriminators with Wasserstein mutual learning to assess generated features. At generation time, we introduce (iv) Diffusion-based Test time Adaptation (DiffTTA) to adapt the generator using pseudo label reconstruction, and (v) Diffusion-based Test time Generation (DiffGen) to trace the diffusion denoising path and produce partially synthesized features that connect real and generated data, and mitigates data scarcity further. Extensive experiments on three ZSL benchmarks demonstrate that ZeroDiff++ not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Code would be available.

研究动机与目标

识别并量化现有生成式 ZSL 方法中在数据稀缺条件下的视觉—语义虚假相关性
通过扩散增强、动态实例级语义以及具互学习的多视角判别器来强化已见类别相关性
通过基于扩散的测试时自适应（DiffTTA）和基于扩散的测试时生成（DiffGen）提升未见类别相关性，从而连接真实与生成特征

提出的方法

引入扩散增强以在有限数据上生成无限的嘈声特征
使用监督对比（SC）表示来提供实例级别的语义
使用三种判别器（对抗、扩散基、表示）并通过 Wasserstein 互学习引导生成
应用扩散基测试时自适应（DiffTTA）以利用伪标签来适配生成器
应用扩散基测试时生成（DiffGen）通过追踪扩散去噪路径来产生部分合成特征
提供一种在不同数据条件下评估生成式 ZSL 的新协议

实验结果

研究问题

RQ1在已有的生成式 ZSL 方法中，当 Seen 数据稀缺时，是否存在大量的虚假视觉—语义相关性？
RQ2扩散增强和多视角判别器是否能显著提升已见类别相关性并降低过拟合？
RQ3扩散基测试时自适应与生成是否提升真实特征与未见类别特征之间的对齐，缓解数据稀缺？
RQ4ZeroDiff++ 在多种 ZSL 基准和数据情形下是否鲁棒？

主要发现

ZeroDiff++ 在三个 ZSL 基准上，在不同训练数据规模下达到新的行业现有最佳性能
扩散增强通过扩大有效训练数据来缓解判别器的过拟合
基于 SC 的实例级表示提供比静态类别级标签更丰富的语义，提升生成质量
对抗、扩散与表示判别器之间的互学习加强了对生成的特征引导
DiffTTA 和 DiffGen 通过对伪标签的适应与将生成特征与真实测试样本连接，提升未见类别特征对齐

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。