QUICK REVIEW

[论文解读] 3D Human Pose Estimation in the Wild by Adversarial Learning

Wei Yang, Wanli Ouyang|arXiv (Cornell University)|Mar 26, 2018

Human Pose and Action Recognition参考文献 53被引用 34

一句话总结

该论文提出了一种对抗性学习框架，仅使用2D姿态标注，即可将完全标注的实验室数据集中的3D人体姿态结构迁移至野外图像中。通过引入一种多源判别器，结合相对关节位置与距离的几何描述符，该方法强制生成解剖学上合理的3D姿态，显著提升了泛化能力，并在MPII和MPI-INF-3DHP基准上实现了最先进性能。

ABSTRACT

Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.

研究动机与目标

解决在缺乏3D标注的非受限、野外环境中进行3D人体姿态估计的挑战。
仅使用野外数据集中的2D姿态标注，实现3D姿态估计器的弱监督训练。
提升在受控实验室数据与真实世界图像之间域偏移下的泛化能力。
以可学习的判别器替代硬编码的姿态约束，以强制实现解剖学上的合理性。

提出的方法

训练一个条件生成器（3D姿态估计器），根据输入图像特征预测3D姿态，条件为输入图像的特征。
设计一个多源判别器，利用两种信息源（原始图像输入与成对关节偏移及距离的几何描述符）来区分真实3D姿态与预测姿态。
几何描述符编码了身体关节之间的相对3D位置与距离，建模人体关节活动与对称性。
端到端进行对抗性训练，使生成器学习生成在未标注的野外数据上与真实姿态难以区分的3D姿态。
判别器基于图像-姿态对应关系与解剖学约束，检测不合理的姿态，从而提升生成器输出质量。

实验结果

研究问题

RQ1对抗性学习能否在无3D标注的情况下，有效将完全标注的实验室数据集中的3D姿态结构迁移至野外图像？
RQ2引入关节关系的几何描述符如何提升判别器强制生成解剖学上合理姿态的能力？
RQ3对2D姿态模块与深度回归器进行端到端对抗性训练，是否能优于使用固定2D特征的3D姿态估计方法？
RQ4所提方法在未见数据集（如MPI-INF-3DHP）上的泛化能力如何？
RQ5判别器能否识别并纠正常见失败案例，如左右颠倒、遮挡与不自然的肢体弯曲？

主要发现

在MPII验证集上，该方法取得88.6的PCKh@0.5得分，比基线高出1.0分。
在MPI-INF-3DHP基准上，该方法取得69.0的PCK得分与32.0的AUC，显著优于基线（PCK: 64.7，AUC: 31.7）。
与微调后的基线相比，端到端对抗性训练使2D姿态估计误差降低了8.1%。
定性对比显示，该模型在遮挡、杂乱背景及对称性误判方面表现出更强的鲁棒性。
将几何描述符作为判别器的输入源，可加快收敛速度并提升泛化能力，训练与验证曲线已证实此结论。
判别器成功识别并纠正了诸如不自然弯曲的肢体与非对称肢体配置等解剖学上不合理的姿态。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。