QUICK REVIEW

[论文解读] Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Xiaolong Wang, Yufei Ye|arXiv (Cornell University)|Mar 21, 2018

Domain Adaptation and Few-Shot Learning参考文献 42被引用 32

一句话总结

该论文提出了一种新颖的零样本识别框架，通过图卷积网络（GCNs）结合语义嵌入与知识图谱，以预测未见类别的视觉分类器。通过使用GCNs在知识图谱中传播信息，该方法在ImageNet的2跳设置下相比最先进方法实现了20.9%的相对提升，展现出对噪声图谱的鲁棒性，并在零样本泛化方面取得显著进步。

ABSTRACT

We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ~ 3% on some metrics to whopping 20% on a few).

研究动机与目标

解决在无任何训练样本的情况下识别新型视觉类别的问题。
通过引入知识图谱中的显式关系知识，克服纯语义嵌入方法的局限性。
通过利用类别之间的结构化关系，实现从已见类别到未见类别的知识迁移，从而提升零样本泛化能力。
在保持高零样本识别基准性能的同时，展示对噪声知识图谱的鲁棒性。
在标准与广义零样本学习设置下均达到最先进性能。

提出的方法

构建一个知识图谱，其中每个节点代表一个视觉类别，边编码语义或类别关系。
使用预训练的词嵌入（如GloVe）作为知识图谱中每个节点的输入特征。
应用6层深层图卷积网络（GCN）通过多层传播和聚合节点间的信息。
通过优化网络参数，使用一部分已见类别的视觉分类器来训练GCN。
推理时，利用训练好的GCN仅根据其语义嵌入和图连接关系，预测未见类别的视觉分类器。
支持标准零样本（测试时仅存在未见类别）和广义零样本（测试时同时存在已见和未见类别）两种设置。

实验结果

研究问题

RQ1知识图谱是否能通过提供超越语义嵌入的结构化关系归纳偏置，增强零样本识别能力？
RQ2在存在噪声的情况下，随着知识图谱规模和复杂度的增加，零样本识别性能如何变化？
RQ3与直接从词嵌入映射到视觉特征的方法相比，基于GCN的消息传递机制在多大程度上提升了泛化能力？
RQ4在广义零样本学习设置下，该方法表现如何，即测试时同时存在已见和未见类别？
RQ5该方法对词嵌入来源的变化是否具有鲁棒性？是否优于仅依赖词嵌入的模型？

主要发现

所提方法在ImageNet的2跳零样本设置下达到62.4%的top-1准确率，相比之前最先进方法（43.7%）实现了18.7个百分点的绝对提升。
在2跳设置下，该方法在top-5准确率上相比最先进方法EXEM高出20.9%，展现出显著的性能优势。
该方法在不同词嵌入（GloVe、FastText、word2vec）下均保持强性能，显示出对嵌入来源变化的鲁棒性。
在广义零样本设置下，该方法在所有指标和数据集上均接近将基线方法（如ConSE和DeViSE）的性能翻倍。
将主干网络从Inception-v1切换到ResNet-50可带来一致的性能提升，证实了该方法的可扩展性。
可视化结果表明，该模型能以高置信度成功预测未见类别（如'okapi'），而基线方法则仍对已见类别存在偏差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。