QUICK REVIEW

[论文解读] FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Yu Rong, Takaaki Shiratori|arXiv (Cornell University)|Aug 19, 2020

Human Pose and Action Recognition参考文献 58被引用 54

一句话总结

FrankMocap 通过将 monocular RGB 输入分离的手部与躯体模块回归三维手部和全身姿态，并整合到统一的 SMPL-X 模型，达到近实时性能（~9.5 fps）以及手部姿态的最先进精度。

ABSTRACT

Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. In this paper, we present FrankMocap, a motion capture system that can estimate both 3D hand and body motion from in-the-wild monocular inputs with faster speed (9.5 fps) and better accuracy than previous work. Our method works in near real-time (9.5 fps) and produces 3D body and hand motion capture outputs as a unified parametric model structure. Our method aims to capture 3D body and hand motion simultaneously from challenging in-the-wild monocular videos. To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be efficiently integrated to monocular body motion capture output, producing whole body motion results in a unified parrametric model structure. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenes, including a live demo scenario.

研究动机与目标

在野外环境中，动机与实现基于单目 RGB 输入的联合 3D 手部与身体运动捕捉。
开发快速、兼容的手部与躯干回归模块，使其输出一个统一的 SMPL-X 表示。
提供一个简单的复制粘贴集成，以及一种基于优化的细化来提升全身输出。

提出的方法

两个回归模块从单个 RGB 图像分别预测 3D 身体和手部姿态。
手部模块将 SMPL-X 手部组件作为独立模型回归手部参数 [ϕ_h, θ_h, β_h] 和弱透视相机 c_h。
躯干模块在 SMPL-X 空间回归躯干与四肢姿态参数 θ_b 和形状 β_b，带有姿态方向 φ_b 和相机 c_b。
整合模块通过复制粘贴或基于优化的拟合将手部与躯干输出整合为统一的 SMPL-X 表示，同时利用 2D 键点。
优化目标 F = F^2D + F^pri 将 3D 关节与 2D 观测对齐，并强制姿态/形状的合理性。
训练使用多数据集手部数据（FreiHAND, HO-3D, MTC, STB, RHD, MPII+NZSL），对数据进行重新缩放与重新排序以适配手部模型；运动模糊增强提高鲁棒性。

实验结果

研究问题

RQ1在 SMPL-X 表示下，单目 RGB 输入是否能够同时产生精确的 3D 手部与躯体姿态？
RQ2分开的手部与躯干回归模块的输出是否可以容易地整合成一个连贯的全身模型？
RQ3快速的复制粘贴集成或基于细化的优化是否在野外环境中实现准确且稳定的全身动作捕捉？

主要发现

Method	Preprocess (fps)	Model (fps)	Overall (fps)
SMPLify-X (CP)	7.5	0.01	0.01
MTC (CP)	7.5	0.1	0.1
Online (CP)	35	13	9.5
Offline (OP)	7.5	13	4.7
Offline (Another)	7.5	1.1	0.95

FrankMocap 在在线（copy-and-paste）模式下实现大约 9.5 fps，能够从单目视频实现近实时的全身运动捕捉。
3D 手部姿态估计在公开基准上达到最先进的性能。
通过 SMPL-X 将手部和躯干输出整合，能够在不进行繁重优化的情况下得到连贯的全身表示；可选的优化阶段提升手腕对齐和 2D 键点的一致性。
大量消融实验显示多样化训练数据集和运动模糊增强对在野外鲁棒性有益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。