QUICK REVIEW

[论文解读] Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

Feng Zhu, Hongsheng Li|arXiv (Cornell University)|Feb 20, 2017

Domain Adaptation and Few-Shot Learning参考文献 35被引用 55

一句话总结

本文提出了一种空间正则化网络（SRN），从图像级监督学习标签注意力图，并利用它们捕捉标签之间的语义和空间关系，在跨数据集的多标签图像分类中实现提升。

ABSTRACT

Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved. The whole deep neural network is trained end-to-end with only image-level annotations, thus requires no additional efforts on image annotations. Extensive evaluations on 3 public datasets with different types of labels show that our approach significantly outperforms state-of-the-arts and has strong generalization capability. Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

研究动机与目标

通过在不需要额外标注的前提下建模标签之间的空间关系，推动多标签图像分类的改进。
开发一个端到端的 CNN 框架，在图像级监督下学习标签注意力图。
将语义和空间标签关系整合以对最终分类结果进行正则化。
展示在具有不同标签类型（对象、概念、属性）的数据集上的泛化能力。

提出的方法

采用基于 ResNet-101 的主分类器进行每标签预测。
引入一个空间正则化网络（SRN），包含两个阶段：注意力图学习和空间正则化。
通过 f_att(X; θ_att) 学习标签注意力图，使用图像级监督在 R^{14×14×C} 中生成 A。
计算加权注意力 U = σ(S) ∘ A，以对每个标签编码可见性和定位。
通过紧凑的、解耦的 1×1 与 14×14 卷积，使用 f_sr(U; θ_sr) 捕获标签关系以限制参数。
将最终置信度聚合为 ŷ = α ŷ_cls + (1−α) ŷ_sr，并使用交叉熵损失进行端到端训练。

实验结果

研究问题

RQ1图像级监督是否足以在多标签图像中学习有意义的标签注意力图以用于空间正则化？
RQ2通过 SRN 学到的标签之间的语义和空间关系是否提升了整体多标签分类性能？
RQ3SRN 在具有不同标签类型（对象、概念、属性）的数据集上的泛化能力如何？
RQ4对性能而言，加权注意力图（U）与非加权注意力图（A）之间的权重对比有何影响？

主要发现

SRN 在 NUS-WIDE、MS-COCO 和 WIDER-Attribute 数据集上持续超过强基线和最新方法的性能。
加权注意力图（U）在学习空间正则化方面优于非加权注意力图（A）。
与 SRN 的端到端训练增加约 600 万参数，并在各数据集上带来明显的 mAP 和 F1 提升。
该方法同时捕捉单个标签的定位信号以及标签之间的共现/相对位置模式。
可视化显示 SRN 中的神经元对标签位置以及多个标签的特定空间配置作出响应。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。