QUICK REVIEW

[论文解读] RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Jiayi Wang, Franziska Mueller|arXiv (Cornell University)|Jun 22, 2021

Human Pose and Action Recognition参考文献 30被引用 42

一句话总结

RGB2Hands 引入了一种实时方法，利用单个 RGB 摄像头通过多任务 CNN 与生成式手部模型拟合框架，跟踪并重建两只互动手的 3D 姿态与表面几何。它在不使用深度传感器的情况下处理深度模糊与遮挡。

ABSTRACT

Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (e.g., considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed intra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.

研究动机与目标

仅使用单目 RGB 输入，解决两只紧密交互手的无标记 3D 手部运动捕捉的挑战。
开发一个鲁棒、实时的管线，用于估计两只手的全局 3D 姿态和手形。
在两手交互跟踪过程中，明确处理 RGB 数据中的深度模糊和遮挡。
创建训练数据和基准数据集（RGB2Hands），以支持基于学习的两手 RGB 重建。

提出的方法

提出一个多任务 CNN，预测逐像素的左右手分割、到 3D 手部模型的密集顶点对图像匹配、手内相对深度图、手间距离以及对遮挡鲁棒的 2D 关键点。
通过最小化复合能量 f(β,θ) = Φ(β,θ) + Ω(β,θ)，对两只手拟合参数化的 3D 手部模型（MANO）。
Φ 将致密 2D 拟合、轮廓、2D 关键点、手内深度以及手间距离项结合起来，使模型与 RGB 数据对齐。
引入手内相对深度与手间距离项，在两手交互时解决来自 RGB 的深度模糊。
使用带有 GPU 加速雅可比评估的 Levenberg–Marquardt 优化实现实时拟合（最多 10 次 LM 迭代）。
在真实（RGB-D）与物理仿真合成数据的混合数据集上进行训练，该数据集建模具有不同形状的交互手，并由基于 MANO 的合成流程引导。

实验结果

研究问题

RQ1单目 RGB 管线能否在实时条件下重建两只紧密交互手的精确 3D 姿态和表面几何？
RQ2在接触或接近接触场景下跟踪两只手时，如何减轻 RGB 的深度模糊？
RQ3预测分割、密集匹配、深度线索和关键点的多任务 CNN，是否为两手模型拟合提供了鲁棒的目标？
RQ4RGB2Hands 与基于深度的方法及非为两手交互设计的 RGB 方法相比，表现如何？

主要发现

该方法能够从单目 RGB 实时重建两只交互手的 3D 姿态和形状。
一个预测分割、密集表面匹配、手内深度、手间距离以及 2D 关键点的多任务 CNN，使拟合阶段实现对两手的鲁棒耦合。
一种包含五个图像拟合项（致密、轮廓、关键点、手内深度、手间距离）的新能量公式，使从 RGB 数据得到一致的 3D 拟合成为可能。
结合合成+真实数据的训练方案，配合物理精确的手对模拟器，提升优化朝向更真实的两手姿态。
RGB2Hands 在与未为两手交互设计的基于 RGB 的方法相比取得显著改进，并在实时深度法方法上表现相当。
一个新的 RGB2Hands 基准数据集提供带有手动关键点和同步深度的真实两手序列，用于三维评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。