QUICK REVIEW

[论文解读] Harmonious Attention Network for Person Re-Identification

Wei Li, Xiatian Zhu|arXiv (Cornell University)|Feb 22, 2018

Video Surveillance and Tracking Methods参考文献 25被引用 188

一句话总结

HA-CNN 在一个轻量级 CNN 中联合学习软像素注意力和硬区域注意力，以在 bounding boxes 错位下提升行人重识别，在三个大型基准数据集上超越最前沿方法。

ABSTRACT

Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID.

研究动机与目标

在自动检测的边界框造成的无约束错位与背景混乱下，推动鲁棒的行人重识别。
提出一个轻量级 CNN，联合学习多级注意力（软像素、软通道和硬区域）与重识别判别学习。
引入跨注意力交互，以最大化注意力模块与特征表示之间的互补信息。
证明联合软/硬注意力在紧凑模型下能取得优越的重识别性能。

提出的方法

引入 Harmonious Attention (HA) 模块，结合软空间注意力、软通道注意力和硬区域注意力。
采用多分支 HA-CNN，包含全局分支和多个局部分支，共享前几层以降低参数量。
软空间注意力和软通道注意力被分解为 A^l = S^l × C^l，其中 S^l 是空间注意力图，C^l 是通道注意力图，通过轻量级子网络计算。
硬区域注意力通过一个小的变换矩阵定位潜在的判别区域；这些区域输入到局部分支。
层内与层间架构：在每一级学习硬和软注意力，使用跨注意力交互学习（CAIL）丰富跨分支的局部/全局特征。
跨注意力交互将全局特征加入到局部特征中：X̃_L^{(l,k)} = X_L^{(l,k)} + X_G^{(l,k)} 以在重识别约束下增强判别性。
联合训练对全局和局部分支均使用身份分类损失，使端到端学习成为可能，无需大量数据增强或预训练。

实验结果

研究问题

RQ1如何在单一的重识别模型中联合学习多级注意力（软像素、软通道、硬区域）以在错位情况下提升性能？
RQ2全局与局部特征分支之间的跨注意力交互是否能增强注意力机制与特征表示之间的协同？
RQ3一个轻量级的 HA-CNN 是否能在减小模型规模与训练复杂度的条件下达到最先进的重识别性能？

主要发现

HA-CNN 在 Market-1501、DukeMTMC-ReID 和 CUHK03 上取得更高的 Rank-1 与 mAP，优于大量前沿方法。
在 Market-1501 (Single-Query) 上，HA-CNN 达到 91.2% R1 与 75.7% mAP，Multi-Query 下为 93.8% R1 与 82.8% mAP。
在 DukeMTMC-ReID 上，HA-CNN 达到 80.5% R1 与 63.8% mAP。
在 CUHK03 (Deteced, 767/700 split)，HA-CNN 实现 41.7% mAP 与 41.7%? R1；与之比较的方法中报道的最佳为 44.4% R1 与 41.0% mAP（有标签），以及 41.7% R1 与 38.6% mAP（检测到的，注：数字按原报道提取）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。