QUICK REVIEW

[论文解读] Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations

Yasunori Kudo, Keisuke Ogaki|arXiv (Cornell University)|Mar 22, 2018

Human Pose and Action Recognition参考文献 4被引用 44

一句话总结

一个基于GAN的方法，通过确保在水平方向旋转下投影仍然合理，在不使用任何3D姿态数据的情况下，从单张图像的2D关节位置预测3D人体姿态。

ABSTRACT

The task of three-dimensional (3D) human pose estimation from a single image can be divided into two parts: (1) Two-dimensional (2D) human joint detection from the image and (2) estimating a 3D pose from the 2D joints. Herein, we focus on the second part, i.e., a 3D pose estimation from 2D joint locations. The problem with existing methods is that they require either (1) a 3D pose dataset or (2) 2D joint locations in consecutive frames taken from a video sequence. We aim to solve these problems. For the first time, we propose a method that learns a 3D human pose without any 3D datasets. Our method can predict a 3D pose from 2D joint locations in a single image. Our system is based on the generative adversarial networks, and the networks are trained in an unsupervised manner. Our primary idea is that, if the network can predict a 3D human pose correctly, the 3D pose that is projected onto a 2D plane should not collapse even if it is rotated perpendicularly. We evaluated the performance of our method using Human3.6M and the MPII dataset and showed that our network can predict a 3D pose well even if the 3D dataset is not available during training.

研究动机与目标

在不依赖3D数据集或连续帧的情况下，从2D关节中推动3D姿态估计。
提出一个无监督对抗框架，从2D关节回归z坐标。
确保在旋转并投影回2D时，生成的3D姿态保持一致。
通过仅利用2D标注，将其应用于野外的2D数据集。

提出的方法

使用生成器G将N×2的2D关节点位置p映射到N个z坐标(z1..zN)。
将生成的3D姿态围绕y轴以随机角度θ∈[-π,π]进行旋转，然后正交投影到2D，结果记作p̂。
训练判别器D以区分真实的2D姿态p和投影的3D姿态p̂，优化V(G,D)=E_p[log D(p)] + E_p,θ[log(1−D(f(p,G(p);θ)))]。
通过以一个中心关节点对2D关节点进行归一化，减去其坐标并按到中心的平均距离进行缩放来增强。
引入一个基于角度的约束L_angle，抑制3D姿态翻转，强制sinβ≥0，其中β是面部和肩部方向向量之间的角度。
最终目标在GAN损失中包含角度约束，V(G,D)=E_p[log D(p)] + E_p,θ[log(1−D(f(p,G(p);θ))) + L_angle]。
网络设计使用四层线性层（1024隐藏单元），对于G和D均采用Leaky ReLU和跳跃连接。

实验结果

研究问题

RQ1是否能够仅从单张图像的2D关节学习3D人体姿态，而无需任何3D姿态数据？
RQ2在生成的3D姿态投影中强制旋转一致性是否能实现从2D输入的合理3D重建？
RQ3无监督方法在野外拍摄的2D数据集（如MPII）或具有真实2D关节的数据集上的迁移性能如何？
RQ4相机几何假设（正投影、水平放置）对重建精度有何影响？

主要发现

该方法能够在不使用3D数据集的情况下，从单张2D关节点位置预测3D姿态。
在Human3.6M且具有真实2D关节点时，方法的平均误差为130.9 mm。
在2D检测关节点（Stacked Hourglass）下，若不使用3D数据进行训练，平均误差为173.2 mm。
在有3D监督的情况下，先前的方法的误差低于无监督方法（例如，对监督基线的范围引用为45.5–62.9 mm）。
该方法在MPII（野外）数据集上仅使用2D注释就展示了定性3D姿态预测。
在使用真实2D关节点进行MPI-INF-3DHP评估时，150 mm处的PCK为89.3，表明高质量的2D姿态有助于提升3D重建。
当提供准确的2D关节点时，该方法对绕垂直轴的视角变化保持鲁棒。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。