[论文解读] Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)
本文提出 PCB,一种用于行人再识别的强大卷积基线,采用均匀部件划分,以及 RPP(Refined Part Pooling),一种将离群特征重新定位以提升同部位内一致性的精炼部件池化方法,在 Market-1501、DukeMTMC-reID 和 CUHK03 上无需姿态信息即可达到最新的最优结果。
Employing part-level features for pedestrian image description offers fine-grained information and has been verified as beneficial for person retrieval in very recent literature. A prerequisite of part discovery is that each part should be well located. Instead of using external cues, e.g., pose estimation, to directly locate parts, this paper lays emphasis on the content consistency within each part. Specifically, we target at learning discriminative part-informed features for person retrieval and make two contributions. (i) A network named Part-based Convolutional Baseline (PCB). Given an image input, it outputs a convolutional descriptor consisting of several part-level features. With a uniform partition strategy, PCB achieves competitive results with the state-of-the-art methods, proving itself as a strong convolutional baseline for person retrieval. (ii) A refined part pooling (RPP) method. Uniform partition inevitably incurs outliers in each part, which are in fact more similar to other parts. RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency. Experiment confirms that RPP allows PCB to gain another round of performance boost. For instance, on the Market-1501 dataset, we achieve (77.4+4.2)% mAP and (92.3+1.5)% rank-1 accuracy, surpassing the state of the art by a large margin.
研究动机与目标
- Motivate learning discriminative part-informed features for person retrieval without external pose cues.
- Propose PCB to extract part-level features via uniform partitioning of conv-layer outputs.
- Introduce Refined Part Pooling (RPP) to relocate outliers and strengthen within-part consistency.
- Demonstrate that PCB + RPP achieves new state-of-the-art results on major re-ID benchmarks.
提出的方法
- PCB replaces the global pooling with a uniform horizontal partition of the conv feature map, followed by per-part classifiers and a final concatenation of part descriptors.
- PCB uses a backbone (e.g., ResNet-50) with the last spatial down-sampling preserved to increase part granularity; each stripe is pooled to a vector, reduced in dimension, and classified with its own FC+Softmax branch.
- RPP introduces a part classifier to assign each local feature to one of p parts using a Softmax over part scores, then samples features per part to refine the stripe-level partition.
- An induced training procedure initializes with uniform partition, appends a part classifier, fixes backbone during a second phase to train the part classifier, and finally fine-tunes the entire network.
- Compared variants show that independent per-part losses and non-shared classifier parameters are beneficial for discriminative part features.
实验结果
研究问题
- RQ1Can a strong convolutional baseline with uniform part partitioning achieve competitive performance in person re-ID without pose or region proposals?
- RQ2Does refining the uniform partition via a learned part classifier (RPP) improve within-part consistency and overall retrieval metrics?
- RQ3How does PCB+RPP compare to attention-based or pose-guided partitioning methods on standard re-ID benchmarks?
主要发现
| Model | Feature | dim | Market-1501 R-1 | Market-1501 R-5 | Market-1501 R-10 | Market-1501 mAP | DukeMTMC-reID R-1 | DukeMTMC-reID R-5 | DukeMTMC-reID R-10 | DukeMTMC-reID mAP | CUHK03 R-1 | CUHK03 R-5 | CUHK03 R-10 | CUHK03 mAP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IDE | pool5 | 2048 | 85.3 | 94.0 | 96.3 | 68.5 | 73.2 | 84.0 | 87.6 | 52.8 | 43.8 | 62.7 | 38.9 | |
| IDE | FC | 256 | 83.8 | 93.1 | 95.8 | 67.7 | 72.4 | 83.0 | 87.1 | 51.6 | 43.3 | 62.5 | 38.3 | |
| Variant 1 | G | 12288 | 86.7 | 95.2 | 96.5 | 69.4 | 73.9 | 84.6 | 88.1 | 53.2 | 43.6 | 62.9 | 71.3 | 38.8 |
| Variant 1 | H | 1536 | 85.6 | 94.3 | 96.3 | 68.3 | 72.8 | 83.3 | 87.2 | 52.5 | 44.1 | 63.0 | 71.5 | 39.1 |
| Variant 2 | G | 12288 | 91.2 | 96.6 | 97.7 | 75.0 | 80.2 | 88.8 | 91.3 | 62.8 | 52.6 | 72.4 | 80.9 | 45.8 |
| Variant 2 | H | 1536 | 91.0 | 96.6 | 97.6 | 75.3 | 80.0 | 88.1 | 90.4 | 62.6 | 54.0 | 73.7 | 81.4 | 47.2 |
| PCB | G | 12288 | 92.3 | 97.2 | 98.2 | 77.4 | 81.7 | 89.7 | 91.9 | 66.1 | 59.7 | 77.7 | 85.2 | 53.2 |
| PCB | H | 1536 | 92.4 | 97.0 | 97.9 | 77.3 | 81.9 | 89.4 | 91.6 | 65.3 | 61.3 | 78.6 | 85.6 | 54.2 |
| PCB+RPP | G | 12288 | 93.8 | 97.5 | 98.5 | 81.6 | 83.3 | 90.5 | 92.5 | 69.2 | 62.8 | 79.8 | 86.8 | 56.7 |
| PCB+RPP | H | 1536 | 93.1 | 97.4 | 98.3 | 81.0 | 82.9 | 90.1 | 92.3 | 68.5 | 63.7 | 80.6 | 86.9 | 57.5 |
- PCB (uniform partition) yields substantial gains over global-descriptor baselines, setting a strong convolutional baseline for person re-ID.
- RPP further improves performance by relocating outliers to the most similar parts, increasing within-part consistency and boosting mAP.
- PCB+RPP achieves state-of-the-art results on Market-1501 (mAP 81.6, Rank-1 93.1), DukeMTMC-reID (mAP 69.2, Rank-1 83.7), and CUHK03 (mAP 57.5, Rank-1 63.7) without re-ranking.
- Induced training for the part classifier is crucial; without induction, attention-like behavior yields inferior results.
- Sharing FC parameters across part classifiers harms performance; separate per-part classifiers are preferable.
- Compared to IDE baseline, PCB provides notable mAP improvements across datasets (e.g., Market-1501: 68.5→77.4 mAP; Duke: 52.8→66.1 mAP).
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。