Skip to main content
QUICK REVIEW

[Paper Review] Pose Invariant Embedding for Deep Person Re-identification

Liang Zheng, Yujia Huang|arXiv (Cornell University)|Jan 26, 2017
Video Surveillance and Tracking Methods23 references178 citations
TL;DR

This paper introduces PoseBox-based pose invariant embedding (PIE) learned through a PoseBox Fusion (PBF) network that fuses the original image, PoseBox, and pose-estimation confidence to robustly re-identify people under pose and detector variation.

ABSTRACT

Pedestrian misalignment, which mainly arises from detector errors and pose variations, is a critical problem for a robust person re-identification (re-ID) system. With bad alignment, the background noise will significantly compromise the feature learning and matching process. To address this problem, this paper introduces the pose invariant embedding (PIE) as a pedestrian descriptor. First, in order to align pedestrians to a standard pose, the PoseBox structure is introduced, which is generated through pose estimation followed by affine transformations. Second, to reduce the impact of pose estimation errors and information loss during PoseBox construction, we design a PoseBox fusion (PBF) CNN architecture that takes the original image, the PoseBox, and the pose estimation confidence as input. The proposed PIE descriptor is thus defined as the fully connected layer of the PBF network for the retrieval task. Experiments are conducted on the Market-1501, CUHK03, and VIPeR datasets. We show that PoseBox alone yields decent re-ID accuracy and that when integrated in the PBF network, the learned PIE descriptor produces competitive performance compared with the state-of-the-art approaches.

Motivation & Objective

  • Address pedestrian misalignment caused by pose variation and detector errors in person re-ID.
  • Propose PoseBox to normalize pose and three-stream PoseBox Fusion to mitigate pose estimation errors.
  • Learn a robust PIE descriptor that rivals state-of-the-art methods on standard benchmarks.

Proposed method

  • Construct PoseBox from detected body joints via CMP-based pose estimation and affine projections into three types (PoseBox1, PoseBox2, PoseBox3).
  • Introduce a three-stream PoseBox Fusion (PBF) network that inputs the PoseBox, the original image, and a 14-dim pose estimation confidence vector; two image streams have separate CNNs whose outputs and a projected confidence vector are concatenated before the final FC layer.
  • Define PIE as the fully connected (FC) activations after fusion (either PIE(A, FC7)/PIE(A, FC8) for AlexNet or PIE(R, Pool5)/PIE(R, FC) for ResNet-50).
  • Train with a sum of three softmax losses corresponding to the three inputs; apply ReLU to PIE embeddings and use Euclidean distance for retrieval.

Experimental results

Research questions

  • RQ1Can a PoseBox-based normalization improve re-ID performance under pose and detector-induced misalignment?
  • RQ2Does a multi-stream fusion that incorporates pose estimation confidence outperform single-stream PoseBox or original-image baselines?
  • RQ3What is the impact of including arms/head in PoseBox construction on re-ID accuracy?
  • RQ4How does PIE compare to state-of-the-art methods on Market-1501, CUHK03, and VIPeR?

Key findings

  • PIE consistently improves over strong baselines on Market-1501, CUHK03, and VIPeR datasets.
  • On Market-1501, PIE with ResNet-50 achieves rank-1 78.65% and mAP 53.87% (PIE, Pool5/FC variants).
  • PIE (Pool5, img) and PIE (Pool5, pb) variants outperform Baseline1 and Baseline2 across metrics, indicating effective fusion of original image and PoseBox.
  • PoseBox2 (torso+legs+arms) outperforms PoseBox1 (torso+legs), while PoseBox3 (adds head) yields marginal gains; however, fusion with PBF reduces these gaps.
  • PIE with AlexNet and PIE with ResNet-50 reach competitive-to-state-of-the-art results, with PIE+Kissme achieving top performance on some benchmarks.
  • Ablation studies show that removing the original image or the PoseBox streams degrades performance, illustrating the complementary value of fusion and the reliability signal from the confidence vector.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.