QUICK REVIEW

[论文解读] Localization Guided Learning for Pedestrian Attribute Recognition

Pengze Liu, Xihui Liu|arXiv (Cornell University)|Aug 28, 2018

Video Surveillance and Tracking Methods被引用 44

一句话总结

LG-Net 引入属性特定定位以引导局部特征提取，通过将 CAM 指导的局部特征与全局特征融合，在 RAP 和 PA-100K 上实现多项指标的 state-of-the-art 结果。

ABSTRACT

Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos. Existing methods try to use additional pose, part or viewpoint information to complement the global feature representation for attribute classification. However, these methods face difficulties in localizing the areas corresponding to different attributes. To address this problem, we propose a novel Localization Guided Network which assigns attribute-specific weights to local features based on the affinity between proposals pre-extracted proposals and attribute locations. The advantage of our model is that our local features are learned automatically for each attribute and emphasized by the interaction with global features. We demonstrate the effectiveness of our Localization Guided Network on two pedestrian attribute benchmarks (PA-100K and RAP). Our result surpasses the previous state-of-the-art in all five metrics on both datasets.

研究动机与目标

在低分辨率和视角变化下，推动鲁棒的人体属性识别。
提出一个定位引导网络以学习属性特定的局部特征。
利用类别激活图引导精确的属性定位。
通过基于相似度的加权机制融合全局与局部特征。
在大型行人属性数据集上证明最先进的性能。

提出的方法

两分支 LG-Net：固定的全局分支和局部分支。
全局分支生成用于属性的类别激活图（CAM）和类别激活框。
局部分支从 EdgeBoxes 提案中提取 ROI 池化特征。
定位引导模块通过 CAM-ROI 的相似度（使用 CAM 框与提案之间的 IoU）对局部特征加权。
通过对全局和局部引导特征进行逐元素求和实现属性预测的融合。
两阶段训练：用 ImageNet 预训练模型初始化全局分支，在 LG-Net 训练期间固定定位组件。

实验结果

研究问题

RQ1属性特定定位能否提高局部特征提取在行人属性上的鲁棒性？
RQ2CAM 指导的定位加上 ROI 加权的局部特征是否优于以往的部件/姿态/注意力方法？
RQ3所提出的全局与局部特征融合在监控条件下的多标签属性预测中是否有效？
RQ4定位组件对整体性能和定位准确性有何影响？

主要发现

方法	RAP_mA	RAP_Accu	RAP_Prec	RAP_Recall	RAP_F1	PA100K_mA	PA100K_Accu	PA100K_Prec	PA100K_Recall	PA100K_F1
ELF+SVM	69.94	29.29	32.84	71.18	-	44.95	-	-	-	-
CNN+SVM	72.28	31.72	35.75	71.78	-	-	-	-	-	-
ACN	69.66	62.61	80.12	72.26	75.98	-	-	-	-	-
DeepMar	73.79	62.02	74.92	76.21	75.56	75.56	80.42	81.32	80.42	81.32
HP-Net	76.12	65.39	77.33	78.79	78.05	78.05	82.97	82.53	82.09	82.53
JRL	77.81	-	78.11	78.98	78.58	-	-	-	-	-
VeSPA	77.70	67.35	79.51	79.67	79.59	79.59	84.99	81.49	83.20	83.20
Inception-v2	75.43	65.94	79.78	77.05	78.39	78.39	84.12	80.30	82.17	82.17
LG-Net	78.68	68.00	80.36	79.82	80.09	80.09	76.96	83.17	85.04	85.04

LG-Net 在 RAP 和 PA-100K 的五个评估指标上超过前人最先进水平。
在 RAP 上，LG-Net 达到 mA 78.68，Accu 68.00，Prec 80.36，Recall 79.82，F1 80.09。
在 PA-100K 上，LG-Net 达到 mA 80.09，Accu 76.96，Prec 83.17，Recall 85.04，F1 85.04。
定位引导显著提升性能；消融实验显示移除定位时准确率提升约 4.4%。
CAM 生成的定位框、基于 IoU 的相似度、以及 ROI 基础的局部特征共同推动相对于基线和先前方法的改进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。