QUICK REVIEW

[论文解读] Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Yizhe Zhu, Jianwen Xie|arXiv (Cornell University)|Mar 1, 2019

Domain Adaptation and Few-Shot Learning被引用 48

一句话总结

论文介绍了一个语义引导的多注意力定位模型（SGMA）用于零样本学习，能够在没有注释的情况下发现判别性对象部件，并联合学习全局和局部特征，配合嵌入softmax和class-center triplet损失，达到最先进的结果。

ABSTRACT

Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multi-attention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions. Moreover, with the joint supervision of embedding softmax loss and class-center triplet loss, the model is encouraged to learn features with high inter-class dispersion and intra-class compactness. Through comprehensive experiments on three widely used zero-shot learning benchmarks, we show the efficacy of the multi-attention localization and our proposed approach improves the state-of-the-art results by a considerable margin.

研究动机与目标

激发超越全局图像特征的零样本识别中的判别性特征学习。
在无需人工注释的情况下自动发现多个判别性对象部件。
在语义指导下联合学习全局与局部视觉特征，以提升视觉-语义嵌入。
通过 embedding softmax 损失和 class-center triplet 损失来提升特征判别力。
在标准零样本学习基准上展示有效性并分析部件定位的影响。

提出的方法

提出一个 semantic-guided multi-attention localization 模型，在 CNN 特征通道上输出多个注意力图。
使用紧凑性与多样性多注意力损失来鼓励聚焦且多样的部件注意力图。
引入一个可微分的区域裁剪子网，根据注意力峰值裁剪判别性部件。
通过为整幅图像和裁剪后的部件使用独立的 CNN 主干来学习全局和局部特征，并采用后续融合进行嵌入。
采用 embedding softmax 损失以最大化类间分离，并使用 class-center triplet 损失以减小类内方差进行训练。
在推理阶段，结合 embedding 分支和 class-center 分支的分数以对未见过的类别进行分类。

实验结果

研究问题

RQ1弱监督注意力定位是否能够在没有部件注释的情况下识别出多个人判别性对象部件以用于零样本学习？
RQ2在语义表示引导下联合学习全局与局部特征是否能提升零样本识别性能？
RQ3注意力图的紧凑性与多样性约束如何影响定位质量和下游零样本准确率？
RQ4将 embedding softmax 损失与 class-center triplet 损失结合对特征判别力有何影响？
RQ5在标准基准(CUB, FLO, AwA)上，SGMA 相对于最先进的零样本方法的表现如何？

主要发现

SGMA 在三个零样本学习基准(CUB, FLO, AwA)的标准划分上达到最先进的结果，在细粒度数据集上取得显著提升。
该模型在无注释的情况下学习出两个判别性部位区域（头部/尾部），相较随机裁剪提升部件定位，接近带注释部件检测器。
联合训练紧凑性和多样性损失显著提升注意力的精准度和多样性，有助于提升零样本性能。
embedding softmax 损失与 class-center triplet 损失结合实现更高的类间分离和更紧密的类内聚类，提升判别力。
在广义零样本设置中，SGMA 实现了更高的调和均值(H)，在 CUB 数据集上较前方法显著提高 6.7%。
使用全局和局部特征并进行端到端训练，相对于基线和竞争的端到端方法（如 LDF）提供了显著改进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。