[论文解读] A Richly Annotated Dataset for Pedestrian Attribute Recognition
论文介绍了 RAP 数据集,包含 41,585 个行人样本和 72 个属性,以及视点、遮挡和身体部位注释,并使用多标签基线和评估指标分析环境因素对属性识别的影响。
In this paper, we aim to improve the dataset foundation for pedestrian attribute recognition in real surveillance scenarios. Recognition of human attributes, such as gender, and clothes types, has great prospects in real applications. However, the development of suitable benchmark datasets for attribute recognition remains lagged behind. Existing human attribute datasets are collected from various sources or an integration of pedestrian re-identification datasets. Such heterogeneous collection poses a big challenge on developing high quality fine-grained attribute recognition algorithms. Furthermore, human attribute recognition are generally severely affected by environmental or contextual factors, such as viewpoints, occlusions and body parts, while existing attribute datasets barely care about them. To tackle these problems, we build a Richly Annotated Pedestrian (RAP) dataset from real multi-camera surveillance scenarios with long term collection, where data samples are annotated with not only fine-grained human attributes but also environmental and contextual factors. RAP has in total 41,585 pedestrian samples, each of which is annotated with 72 attributes as well as viewpoints, occlusions, body parts information. To our knowledge, the RAP dataset is the largest pedestrian attribute dataset, which is expected to greatly promote the study of large-scale attribute recognition systems. Furthermore, we empirically analyze the effects of different environmental and contextual factors on pedestrian attribute recognition. Experimental results demonstrate that viewpoints, occlusions and body parts information could assist attribute recognition a lot in real applications.
研究动机与目标
- 创建一个来自真实监控场景的大规模、丰富注释的行人数据集。
- 将样本标注为 72 个细粒度属性,以及环境/上下文因素(视点、遮挡、身体部位)。
- 评估基线与多标签模型,以理解上下文如何影响属性识别。
- 引入多标签评估指标,以更好地捕捉真实场景中属性之间的依赖关系。
提出的方法
- 收集来自 26 个相机场景、持续三个月的真实监控视频。
- 将 41,585 个行人样本标注为 72 项属性及上下文因素(视点、遮挡、部位)。
- 使用带有 ELF 和 CNN 特征(FC6/FC7)的 SVM,以及两种多标签 CNN 模型(ACN、DeepMAR)来评估基线。
- 使用两类特征(ELF 和来自 CaffeNet 的 CNN 特征),并比较单属性与多属性联合学习。
- 提出并应用多标签评估指标(准确率、精确率、召回率、F1)以及传统的 mean Accuracy (mA)。
- 通过分析头肩、上半身和下半身区域对属性识别的影响来研究部位的作用。
实验结果
研究问题
- RQ1视点、遮挡和身体部位可见性如何影响行人属性识别性能?
- RQ2多标签学习方法(ACN、DeepMAR)在 RAP 上是否优于单属性分类器?
- RQ3基于部位的表示在真实监控条件下是否能提升属性识别?
- RQ4在此情境下,哪些评估指标最能捕捉多属性之间的相关性?
主要发现
- RAP 是迄今为止最大的行人数据集,包含 41,585 个样本和 72 个属性以及上下文注释。
- 视点、遮挡和身体部位信息显著影响属性识别性能。
- 基于 CNN 的特征(FC6/FC7)通常优于 ELF 特征,在该任务中 FC6 显示出强泛化性。
- 基于实例的(多标签)评估揭示了有意义的属性之间依赖关系,以及多属性联合学习相对于单属性 SVM 方法的显著提升。
- 基于部位的分析表明,与特定身体区域相关的属性受益于使用头肩、上半身或下半身特征,整合部位可以改善识别。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。