QUICK REVIEW

[论文解读] Multi-Label Image Recognition with Graph Convolutional Networks

Zhao-Min Chen, Xiu-Shen Wei|arXiv (Cornell University)|Apr 7, 2019

Text and Document Classification Technologies参考文献 35被引用 62

一句话总结

论文介绍了 ML-GCN，一种基于图卷积网络的模型，将标签词嵌入映射到相互依赖的对象分类器，并将其应用于图像特征以实现端到端多标签识别，同时引入重新加权的标签相关性矩阵以改善传播与泛化。

ABSTRACT

The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). The model builds a directed graph over the object labels, where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers. These classifiers are applied to the image descriptors extracted by another sub-net, enabling the whole network to be end-to-end trainable. Furthermore, we propose a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN. Experiments on two multi-label image recognition datasets show that our approach obviously outperforms other existing state-of-the-art methods. In addition, visualization analyses reveal that the classifiers learned by our model maintain meaningful semantic topology.

研究动机与目标

建模并利用标签依赖关系以提升多标签图像识别。
通过图卷积网络从词嵌入中学习标签的相互依赖分类器。
引入数据驱动的重新加权相关矩阵以引导信息传播并减轻过平滑。
展示端到端可训练性并在标准基准上实现优越的性能。

提出的方法

将每个标签表示为词嵌入，并构建有向标签图。
使用堆叠的 GCN 将标签嵌入映射到一组互相依赖的分类器 W。
将分类器 W 应用于由 CNN 提取的图像特征 x，以获得预测 ŷ = W x。
从标签共现构建数据驱动的标签相关性矩阵 A；用阈值 τ 二值化形成 A。
引入重新加权方案 A'，以平衡节点自权重与邻域影响并降低过平滑。
使用标准多标签分类损失进行端到端训练。

实验结果

研究问题

RQ1如何明确建模标签依赖关系以提升多标签识别性能？
RQ2标签嵌入能否被转换为相互依赖的分类器，利用共现模式？
RQ3重新加权的相关性矩阵是否在基于 GCN 的标签建模中降低过拟合和过平滑？
RQ4学到的分类器是否体现了标签间有意义的语义拓扑？

主要发现

带有重新加权相关矩阵的 ML-GCN 在 MS-COCO 与 VOC 2007 上始终优于最先进方法。
使用二值相关性导致过平滑，表现不如重新加权方案。
重新加权方案在各数据集上提升关键指标，如 mAP、CF1 和 OF1。
词嵌入有助于提升，但并非单独负责增益；基于 GCN 的映射和相关性建模驱动了显著改进。
分类器可视化显示出有意义的语义拓扑，相关标签在分类器空间中形成簇。
通过 ML-GCN 学习的图像表示相较于原生 ResNet 基线提升了图像检索质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。