QUICK REVIEW

[论文解读] HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

Xihui Liu, Haiyu Zhao|arXiv (Cornell University)|Sep 28, 2017

Video Surveillance and Tracking Methods参考文献 29被引用 90

一句话总结

HydraPlus-Net 引入多方向注意力，用以学习多级和多尺度特征，从而提升行人属性识别和人再识别，并引入 PA-100K 数据集。

ABSTRACT

Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems. Despite that the convolutional neural networks are remarkable in learning discriminative features from images, the learning of comprehensive features of pedestrians for fine-grained tasks remains an open problem. In this study, we propose a new attention-based deep neural network, named as HydraPlus-Net (HP-net), that multi-directionally feeds the multi-level attention maps to different feature layers. The attentive deep features learned from the proposed HP-net bring unique advantages: (1) the model is capable of capturing multiple attentions from low-level to semantic-level, and (2) it explores the multi-scale selectiveness of attentive features to enrich the final feature representations for a pedestrian image. We demonstrate the effectiveness and generality of the proposed HP-net for pedestrian analysis on two tasks, i.e. pedestrian attribute recognition and person re-identification. Intensive experimental results have been provided to prove that the HP-net outperforms the state-of-the-art methods on various datasets.

研究动机与目标

推动在超越全局表征的鲁棒行人分析特征学习。
开发多方向注意力（MDA）机制以融合多级特征。
利用基于注意力、尺度感知的表示用于细粒度属性和 re-identification。
证明 HP-net 在行人属性识别和人/行人 re-ID 数据集上的通用性。
引入一个用于多样场景的新大规模行人属性数据集 PA-100K。

提出的方法

提出 HydraPlus Network (HP-net)，具有 Main Net (M-net) 和 Attentive Feature Net (AF-net)。
嵌入三个多方向注意力（MDA）模块，每个模块从网络块生成注意力图并将其应用于多个后续块。
使用 1x1 卷积在 BN 和 ReLU 的条件下生成注意力图，固定通道数为 L=8，并对特征图进行逐元掩蔽。
连接多级注意力特征，随后进行全局平均池化并通过全连接层得到最终的 logits 或特征向量。
以阶段性方式训练 HP-net：先训练 M-net，再微调 AF-net 分支，最后训练 GAP/FC 层。

实验结果

研究问题

RQ1多级、多尺度注意力如何提升区分性行人特征学习？
RQ2在相邻特征块之间应用注意力图（多方向）是否比传统单块注意力产生更好的表征？
RQ3HP-net 是否能够同时提升行人属性识别和 re-ID 的性能？
RQ4多级注意力的多样性与一致性对识别准确性有何影响？
RQ5HP-net 是否能在多个人行人分析数据集和真实世界监控数据上实现泛化？

主要发现

HP-net 在行人属性识别数据集 RAP、PETA 以及所提 PA-100K 上达到 state-of-the-art 性能。
在属性任务上，HP-net 相对于现有方法有显著改进，尤其是对如眼镜、手提包等细粒度属性。
对于 person re-identification，HP-net 在 CUHK03、VIPeR、Market-1501 的 Top-1 准确率分别为 91.8/56.6/76.9，超越若干基线并使 M-net 提升了 3.6/5.0/3.8 个百分点。
来自不同 inception 块的多级注意力同时捕捉低级纹理和高级语义模式，而多方向掩蔽增强了跨层的特征融合。
引入一个新的 PA-100K 数据集，包含 100,000 张跨 598 个场景的行人图像，为属性识别提供大规模、多样化的基准。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。