QUICK REVIEW

[论文解读] Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

Xipeng Chen, Kwan-Yee Lin|arXiv (Cornell University)|Mar 21, 2019

Human Pose and Action Recognition参考文献 43被引用 42

一句话总结

本文提出一个弱监督框架，通过基于骨架的视图合成编码器-解码器，从多视角的二维骨架学习几何感知的三维姿态表示，并引入表示一致性约束，以提升单目三维姿态估计。

ABSTRACT

Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures. However, the generalizability to different environments remains an elusive goal. In this work, we propose a geometry-aware 3D representation for the human pose to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision. A view synthesis framework is proposed to learn the shared 3D representation between viewpoints with synthesizing the human pose from one viewpoint to the other one. Instead of performing a direct transfer in the raw image-level, we propose a skeleton-based encoder-decoder mechanism to distil only pose-related representation in the latent space. A learning-based representation consistency constraint is further introduced to facilitate the robustness of latent 3D representation. Since the learnt representation encodes 3D geometry information, mapping it to 3D pose will be much easier than conventional frameworks that use an image or 2D coordinates as the input of 3D pose estimator. We demonstrate our approach on the task of 3D human pose estimation. Comprehensive experiments on three popular benchmarks show that our model can significantly improve the performance of state-of-the-art methods with simply injecting the representation as a robust 3D prior.

研究动机与目标

通过在有限的三维注释下学习几何感知表示，推动在不同环境和动作中具有泛化性的稳健三维姿态估计。
仅使用二维监督，从多视角骨架中学习一个共享的三维姿态表示。
将与姿态相关的信息蒸馏到一个更易映射到三维姿态的潜在空间。
通过利用视图合成和潜在空间一致性约束来提高泛化能力。

提出的方法

使用来自多视角图像的二维骨架图作为输入，而不是原始图像。
训练一个基于骨架的编码器-解码器，从源视图骨架合成目标视图骨架，潜在码表示几何 G。
通过跨视图方向的一致性损失约束 G 使其成为一个语义上有意义的三维姿态表示。
引入一个双向编码器-解码器结构，在已知视角旋转下强制潜在空间的一致性。
将学习到的几何表示 G 作为先验注入到三维姿态回归器中，使从 G 回归到三维关节坐标变得简单。

实验结果

研究问题

RQ1是否可以仅使用二维标注从多视角数据中学习一个面向几何的三维人体姿态表示？
RQ2结合潜在空间一致性约束的基于骨架的视图合成框架是否能产生稳健的三维姿态表示，从而提升单目姿态估计？
RQ3所学习的几何表示能否作为有效先验，提升跨数据集与协议的最新三维姿态估计方法？

主要发现

基于骨架的视图合成框架产生的几何表示 G，在作为先验注入时能提升三维姿态估计。
在有限的三维注释下，通过简单的两层回归器从 G 回归三维姿态可以获得合理结果，并且 G 能在各协议中提升更强的基线。
表示一致性约束减少了不合理姿态，提升 G 的鲁棒性，实证分析（消融）显示在包含该约束时结果提高。
通过虚拟相机的数据增强和表示一致性约束共同带来相对于基线的可观性能提升。
学习得到的 G 能跨数据集泛化，MPII 等野外场景的定性结果展示了该方法的实际有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。