QUICK REVIEW

[论文解读] Batch DropBlock Network for Person Re-identification and Beyond

Zuozhuo Dai, Mingqiang Chen|arXiv (Cornell University)|Nov 17, 2018

Video Surveillance and Tracking Methods参考文献 79被引用 23

一句话总结

本文提出批量DropBlock网络（BDB），一种用于行人重识别与图像检索的双分支卷积神经网络，采用批量DropBlock正则化。通过在训练过程中对特征图实施结构化dropout，BDB增强了对遮挡和视角变化的特征鲁棒性，在多个基准测试中达到最先进性能，包括在Market1501数据集上实现95.8%的Rank-1准确率（经重排序后）。

ABSTRACT

Since the person re-identification task often suffers from the problem of pose changes and occlusions, some attentive local features are often suppressed when training CNNs. In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch. The global branch encodes the global salient representations. Meanwhile, the feature dropping branch consists of an attentive feature learning module called Batch DropBlock, which randomly drops the same region of all input feature maps in a batch to reinforce the attentive feature learning of local regions. The network then concatenates features from both branches and provides a more comprehensive and spatially distributed feature representation. Albeit simple, our method achieves state-of-the-art on person re-identification and it is also applicable to general metric learning tasks. For instance, we achieve 76.4% Rank-1 accuracy on the CUHK03-Detect dataset and 83.0% Recall-1 score on the Stanford Online Products dataset, outperforming the existing works by a large margin (more than 6%).

研究动机与目标

在遮挡和视角变化等挑战性条件下，提升行人重识别中的特征鲁棒性。
解决标准数据增强与正则化在学习空间分布注意力感知特征方面的局限性。
开发一种无需精确图像对齐的训练策略，以增强泛化能力。
评估Batch DropBlock在行人重识别与零样本图像检索任务中的有效性。

提出的方法

提出一种双分支网络架构，以学习高维特征嵌入，提升表征能力。
引入Batch DropBlock，一种结构化dropout技术，可在训练期间随机屏蔽特征图中的完整空间块。
采用不同的丢弃比率（r_h, r_w）控制丢弃块的高度与宽度，以促进空间不变性。
使用重排序后处理进一步优化匹配分数，提升mAP与Rank-1准确率。
采用类别激活图（CAMs）可视化并对比基线模型与BDB模型的注意力分布。
在多个数据集（包括Market1501、DukeMTMC-reID、CUHK03、CUB200与CARS196）上评估性能，涵盖对齐与非对齐设置。

实验结果

研究问题

RQ1结构化dropout（Batch DropBlock）是否能提升行人重识别在遮挡与视角变化下的特征鲁棒性？
RQ2Batch DropBlock相较于标准dropout与数据增强，在学习空间分布与判别性特征方面表现如何？
RQ3当输入图像未大致对齐时，BDB网络是否仍能保持性能，表明其对真实世界数据的泛化能力？
RQ4重排序在多个基准测试中对BDB网络性能的提升程度如何？
RQ5BDB的类别激活图与标准ResNet相比，在突出显示相关物体部位方面有何差异？

主要发现

在Market1501数据集上，BDB结合重排序后达到95.8%的Rank-1准确率与93.7%的mAP，优于基线模型与先前方法。
在CUB200与CARS196数据集上（无裁剪，即无粗略对齐），BDB在未使用Batch DropBlock（r_h=0, r_w=0）时分别取得67.8%与87.8%的Recall@1，优于启用DropBlock的版本。
类别激活图显示，BDB在身体各部位与物体区域间学习到更广泛分布且更显著的特征，而基线模型则仅聚焦于有限的判别区域。
重排序在所有数据集上均一致提升Rank-1与mAP分数，其中CUHK03-Label数据集表现最佳，达到87.4%的Rank-1与88.7%的mAP。
可视化结果证实，BDB学习到姿态不变特征，即使在后视图查询下也能成功检索到正确身份。
在图像检索任务（CUB200、CARS196、In-Shop、Stanford）中，BDB生成的CAM更清晰、更局部化，背景干扰点更少，优于基线模型。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。