QUICK REVIEW

[论文解读] Automatic Adaptation of Person Association for Multiview Tracking in Group Activities

Minh Vo, Ersin Yumer|arXiv (Cornell University)|May 22, 2018

Video Surveillance and Tracking Methods参考文献 45被引用 4

一句话总结

本文提出一种自监督框架，通过运动追踪、互斥约束和多视角几何，将通用的人体外观描述符适配到无标注的多视角视频序列中，实现了在复杂群体活动中的鲁棒人物关联与3D骨骼追踪。该方法在WILDTRACK和新构建的wildscene数据集上，将人物关联准确率最高提升18%，3D追踪稳定性提升5–10倍，优于基线方法。

ABSTRACT

Reliable markerless motion tracking of people participating a complex group activity from multiple moving cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. To solve this problem, reliable association of the same person across distant viewpoints and temporal instances is essential. We present a self-supervised framework to adapt a generic person appearance descriptor to the unlabeled videos by exploiting motion tracking, mutual exclusion constraints, and multi-view geometry. The adapted discriminative descriptor is used a tracking-by-clustering formulation. We validate the effectiveness of our descriptor learning on WILDTRACK [14] and three new complex social scenes captured by multiple cameras with up to 60 people in the wild. We report significant improvement association accuracy (up to 18%) and stable and coherent 3D human skeleton tracking (5 to 10 times) over the baseline. Using the reconstructed 3D skeletons, we cut the input videos into a multi-angle video where the image of a specified person is shown from the best visible front-facing camera. Our algorithm detects inter-human occlusion to determine the camera switching moment while still maintaining the flow of the action well.

研究动机与目标

解决在频繁遮挡和视角变化的复杂群体活动中，跨多摄像头实现可靠人物关联的挑战。
在包含多达60人的非结构化真实环境中，实现鲁棒的3D人体骨骼追踪。
开发一种自监督方法，将通用外观描述符适配到无标注多视角视频中，无需人工标注。
通过检测人与人之间的遮挡，动态切换至最佳视角摄像头，同时保持动作的连续性。
通过判别性描述符的自适应，提升异步移动摄像头设置下的追踪一致性与准确性。

提出的方法

利用运动追踪生成跨多摄像头的初始人物轨迹。
应用互斥约束，确保各视角下每个位置仅对应一人，减少错误关联。
利用多视角几何确保一致的3D重建，并验证跨视角的对应关系。
通过自监督方式适配通用外观描述符，利用追踪一致性和几何一致性作为监督信号。
采用基于聚类的追踪方法，利用适配后的描述符将检测结果聚类，形成连贯的人物轨迹。
检测人与人之间的遮挡，触发最优摄像头切换，为每个人选择最清晰的正面视角。

实验结果

研究问题

RQ1如何在存在大视角和外观差异的情况下，可靠地维持多摄像头间的人物关联？
RQ2在无需人工标注的情况下，通用外观描述符在复杂真实场景中能多大程度被适配？
RQ3运动追踪与几何约束能否共同提升多视角追踪中人物描述符的判别能力？
RQ4与基线方法相比，所提方法在人物关联准确率与3D追踪稳定性方面表现如何？
RQ5基于遮挡检测的动态摄像头切换是否能在提升可见性的同时保持动作流的连贯性？

主要发现

所提方法在WILDTRACK数据集和新构建的复杂社交场景中，人物关联准确率相比基线最高提升18%。
3D人体骨骼追踪的稳定性相比基线提升5至10倍，展现出更强的时间一致性。
自监督描述符适配能有效应对真实非受限环境中强烈的视角与外观变化。
系统成功检测到人与人之间的遮挡，并切换至最佳可见摄像头视角，同时保持动作连续性。
该方法在包含多达60人的场景中具有良好泛化能力，展现出在高密度群体活动中的鲁棒性。
基于适配描述符的聚类追踪方法，在多视角下生成了稳定且连贯的人物轨迹。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。