QUICK REVIEW

[论文解读] Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views

Junting Dong, Wen Jiang|arXiv (Cornell University)|Jan 14, 2019

Human Pose and Action Recognition参考文献 40被引用 34

一句话总结

本文提出了一种快速、鲁棒的多视角管线，通过对来自不同视角的检测到的二维姿态进行凸优化的多路匹配并实现循环一致性来进行聚类，然后对每个簇使用3D pictorial structures 或 triangulation 重建三维姿态。它利用 appearance 和 geometric cues 来提升跨视图匹配，在 Campus 与 Shelf 数据集上实现了 state-of-the-art PCP 在所评估数据上未进行训练。

ABSTRACT

This paper addresses the problem of 3D pose estimation for multiple people in a few calibrated camera views. The main challenge of this problem is to find the cross-view correspondences among noisy and incomplete 2D pose predictions. Most previous methods address this challenge by directly reasoning in 3D using a pictorial structure model, which is inefficient due to the huge state space. We propose a fast and robust approach to solve this problem. Our key idea is to use a multi-way matching algorithm to cluster the detected 2D poses in all views. Each resulting cluster encodes 2D poses of the same person across different views and consistent correspondences across the keypoints, from which the 3D pose of each person can be effectively inferred. The proposed convex optimization based multi-way matching algorithm is efficient and robust against missing and false detections, without knowing the number of people in the scene. Moreover, we propose to combine geometric and appearance cues for cross-view matching. The proposed approach achieves significant performance gains from the state-of-the-art (96.3% vs. 90.6% and 96.9% vs. 88% on the Campus and Shelf datasets, respectively), while being efficient for real-time applications.

研究动机与目标

在多视角设置中降低多人体三维姿态估计的复杂度，避免对所有人进行关节级别的三维推断。
使用外观与几何线索来建立对跨视图的二维姿势的一致性对应。
在鲁棒的二维姿势匹配后高效地为每个人推断三维姿态，以实现实时或近实时性能。
通过凸优化框架处理未知人数以及缺失/不完整检测的问题。

提出的方法

在每个视图使用现成的检测器（Cascaded Pyramid Network）检测二维姿态。
通过结合外观特征（re-ID 描述符）和几何一致性（极线约束）在视图之间构建边界框的亲和力。
将多路匹配表述为带循环一致性的凸优化：通过核范数松弛和 ADMM 最小化 -<A,P> + lambda*rank(P)，输出 P 指示跨视图的对应关系。
通过在所有视图上使用一个全局匹配来实现循环一致性，修剪错误检测且不需要已知真实人数。
使用匹配得到的二维姿态来重建三维姿态，可以使用带骨架先验的三维 Pictorial Structures(3DPS) 或在有利的情况下使用简单三角测量；通过聚类来减少 3DPS 状态空间。

实验结果

研究问题

RQ1如何在存在噪声/缺失检测的情况下，跨越多视图建立稳健的二维姿态跨视图对应？
RQ2在多视角三维姿态估计中，将外观线索与几何约束结合是否比仅使用几何信息能提升跨视图匹配？
RQ3循环一致性约束与低秩松弛能否在不知道人数的情况下产生准确、可扩展的多视图检测聚类？
RQ4匹配驱动的聚类对在拥挤场景中通过 3DPS 或三角测量进行三维姿态重建的效率与准确性有何影响？

主要发现

提出的多路匹配方法结合循环一致性显著提升跨视图对应和鲁棒性，从而提升三维姿态估计的效果。
外观和几何线索的结合为匹配提供更优的亲和力分数，尤其在遮挡或同人人外观相似的情况下表现突出。
利用匹配对二维姿态进行聚类可减少 3DPS 的状态空间，加速推断并在相机数量较少时提高鲁棒性。
该方法在 Campus(平均 96.3) 和 Shelf(平均 96.9) 数据集相比多组基线达到state-of-the-art PCP。
实时性能是可实现的：在4-5个视图下，系统在没有 3DPS 模型时运行超过 20 fps，重新识别约 25 ms 加上匹配约 20 ms，再加上 60 ms 的 3D 姿态推断在测试中的表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。