QUICK REVIEW

[论文解读] Hand3D: Hand Pose Estimation using 3D Neural Network

Xiaoming Deng, Shuo Yang|arXiv (Cornell University)|Apr 7, 2017

Hand Gesture Recognition Systems参考文献 20被引用 67

一句话总结

本论文提出一种直接从深度图的 TSDF体积表示中估计3D手部关节位置的3D CNN，使用合成数据增强和TSDF细化模块，在NYU和ICVL手部姿态数据集上达到SOTA。

ABSTRACT

We propose a novel 3D neural network architecture for 3D hand pose estimation from a single depth image. Different from previous works that mostly run on 2D depth image domain and require intermediate or post process to bring in the supervision from 3D space, we convert the depth map to a 3D volumetric representation, and feed it into a 3D convolutional neural network(CNN) to directly produce the pose in 3D requiring no further process. Our system does not require the ground truth reference point for initialization, and our network architecture naturally integrates both local feature and global context in 3D space. To increase the coverage of the hand pose space of the training data, we render synthetic depth image by transferring hand pose from existing real image datasets. We evaluation our algorithm on two public benchmarks and achieve the state-of-the-art performance. The synthetic hand pose dataset will be available.

研究动机与目标

提出直接从单幅深度图进行3D手部姿态估计的方法，无需后处理或预定义模型。
提出一个3D体积表示（TSDF）与3D CNN，在COM坐标系中预测3D关节位置。
通过TSDF细化和带有可变骨长的合成数据增强来提高训练数据的多样性和深度质量。
展示在NYU和ICVL手势基准数据集上的最先进性能。

提出的方法

将深度图转换为以手部COM对齐的60x60x60 TSDF体积。
用3D FCN对原始TSDF进行细化，完成缺失深度并减少伪影。
使用3D卷积网络直接回归相对于COM坐标的3D关节位置，采用L2损失。
在包含带有可变骨长的合成姿态的增强数据上端到端训练网络。
通过将手部姿态转移到可配置的CAD模型并渲染深度图来进行数据增强。
可选地通过逆向运动学从真实数据中恢复姿态并转移到BVH以用于合成数据生成。

实验结果

研究问题

RQ1在TSDF体积上运行的3D CNN是否可以直接以COM坐标估计3D手部关节位置，而无需后处理？
RQ2TSDF细化和3D数据增强是否会提升在标准基准上的3D手部姿态精度？
RQ3该方法对不同手骨架和骨长的泛化能力如何？
RQ4所提出的合成数据增强和骨长变化对姿态估计的性能影响是多少？

主要发现

该方法在NYU和ICVL手部姿态数据集上达到最先进的性能。
在COM坐标下的直接3D姿态估计消除了将2D估计投影到3D所需的后处理。
TSDF细化提高了姿态准确性，特别是在较低误差阈值时。
带有不同骨长的数据增强和合成姿态转移显著提升了性能。
该方法在GTX TITAN X上约以30 FPS运行，较多基于模型的方法更快，同时提供更高的精度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。