QUICK REVIEW

[论文解读] Deep Association Learning for Unsupervised Video Person Re-identification

Yanbei Chen, Xiatian Zhu|arXiv (Cornell University)|Aug 22, 2018

Video Surveillance and Tracking Methods被引用 59

一句话总结

论文提出了 Deep Association Learning (DAL)，一种端到端的无监督视频重新识别方法，联合优化同摄像头内的关联损失和跨摄像头关联损失，以在没有身份标签的情况下学习判别性特征，在 PRID 2011、iLIDS-VID 及 MARS 上达到最先进的结果。

ABSTRACT

Deep learning methods have started to dominate the research progress of video-based person re-identification (re-id). However, existing methods mostly consider supervised learning, which requires exhaustive manual efforts for labelling cross-view pairwise data. Therefore, they severely lack scalability and practicality in real-world video surveillance applications. In this work, to address the video person re-id task, we formulate a novel Deep Association Learning (DAL) scheme, the first end-to-end deep learning method using none of the identity labels in model initialisation and training. DAL learns a deep re-id matching model by jointly optimising two margin-based association losses in an end-to-end manner, which effectively constrains the association of each frame to the best-matched intra-camera representation and cross-camera representation. Existing standard CNNs can be readily employed within our DAL scheme. Experiment results demonstrate that our proposed DAL significantly outperforms current state-of-the-art unsupervised video person re-id methods on three benchmarks: PRID 2011, iLIDS-VID and MARS.

研究动机与目标

通过开发一个无监督的端到端 CNN 方法来解决视频中人物再识别缺乏可扩展标注数据的问题。
利用两种一致性形式——局部同摄像头时空一致性和全局跨摄像头循环排序一致性—以学习鲁棒表示。
在消除人工身份标签的同时，通过自我发现的锚点实现有效的跨摄像头轨迹关联。

提出的方法

引入两组锚点：每个摄像头的 intrA-camera anchors x_{k,i}，以及来自跨摄像头高度相关的同摄像头锚点合并得到的跨摄像头锚点 a_{k,i}。
定义两种基于边距的 top-push 关联损失：L_I（同摄像头排序）和 L_C（跨摄像头排序），相对于动态锚点和轨道帧进行计算。
同摄像头学习通过帧特征的指数滑动平均更新锚点，并强制执行 top-push 以确保源轨迹在同摄像头中保持最高排名。
跨摄像头学习通过对同摄像头锚点进行循环排序来发现跨摄像头关联，并在满足循环一致性时将成对锚点合并为跨摄像头锚点。
通过将 L_DAL = L_I + lambda L_C 与在 ImageNet 初始化的 CNN 骸骨（ResNet50 或 MobileNet）的标准 SGD/Adam 式优化器端到端联合优化来训练模型。
在训练过程中利用分批的迭代过程逐步发现并利用跨摄像头的对应关系。

实验结果

研究问题

RQ1DAL 是否能够在没有任何身份标签的情况下学习有效的视频再识别表示？
RQ2局部同摄像头的一致性和跨摄像头循环排序是否提供互补的监督来提升无监督视频再识别？
RQ3在公开基准数据集上使用无标签数据，端到端的 DAL 在标准 CNN 骨架上表现如何？
RQ4在训练过程中跨摄像头关联的速率如何变化，这又如何影响再识别性能？

主要发现

DAL 在 PRID 2011、iLIDS-VID 和 MARS 上显著超越了最先进的无监督视频再识别方法。
相较于先前的无监督方法，Rank-1 准确率在 PRID 2011 上提升 4.4%，在 iLIDS-VID 上提升 15.2%，在 MARS 上提升 12.5%。
仅使用跨摄像头关联就已获得具竞争力的结果，结合同摄像头学习时可获得更大提升。
DAL 在不同骨架（ResNet50 和 MobileNet）上实现一致的性能，展示了与标准 CNN 的良好适应性。
训练过程中大量轨迹在跨摄像头上获得关联（PRID 2011 约 90%，iLIDS-VID 约 75%，MARS 超过 50%），在发现的跨摄像头对中具有较高的真匹配率。
与带标签的有监督训练相比，DAL 在较小数据集上可达到可比性能，在某些数据集上甚至接近有监督性能，凸显强大的无监督学习能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。