QUICK REVIEW

[论文解读] TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors

Xinyu Yi, Yuxiao Zhou|arXiv (Cornell University)|May 10, 2021

Human Pose and Action Recognition参考文献 44被引用 52

一句话总结

TransPose 使用六个 IMUs 在多阶段姿态管线和基于融合的全局平移估计器的帮助下实现实时的三维人体姿态估计和全球平移，帧率超过 90 fps。

ABSTRACT

Motion capture is facing some new possibilities brought by the inertial sensing technologies which do not suffer from occlusion or wide-range recordings as vision-based solutions do. However, as the recorded signals are sparse and quite noisy, online performance and global translation estimation turn out to be two key difficulties. In this paper, we present TransPose, a DNN-based approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps. For body pose estimation, we propose a multi-stage network that estimates leaf-to-full joint positions as intermediate results. This design makes the pose estimation much easier, and thus achieves both better accuracy and lower computation cost. For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique. Quantitative and qualitative comparisons show that our method outperforms the state-of-the-art learning- and optimization-based methods with a large margin in both accuracy and efficiency. As a purely inertial sensor-based approach, our method is not limited by environmental settings (e.g., fixed cameras), making the capture free from common difficulties such as wide-range motion space and strong occlusion.

研究动机与目标

通过利用时间信息和姿态先验来解决仅使用六个 IMUs 的全运动捕捉问题的不足约束
实现对人体姿态和全局平移的实时估计（超过 90 fps），无需相机或外部传感器
通过将姿态估计分解为中间的关节点位置任务来提高相对于先前 DIP/SIP 方法的精度和效率
提出一种基于融合的鲁棒方法，用于从稀疏惯性数据实时估计全局平移

提出的方法

一个三阶段的姿态估计管线，先预测叶子关节点位置（Pose-S1），再完成所有关节点位置（Pose-S2），最后使用带 LSTM 单元的双向 RNN 回归关节点旋转（Pose-S3）
叶子关节被用作中间表示，以利用人体运动学层级和时间信息
全局平移估计通过两条并行分支完成：基于足部着地的速度估计（Trans-B1）和根节点速度 RNN（Trans-B2），根据脚部接触概率进行融合
足部着地网络使用叶子关节位置和 IMU 数据来推断哪只脚着地，并从支撑脚的正向运动学计算根节点速度
Trans-B2 在根坐标系中用 RNN 预测根节点速度，然后使用根旋转将其转换到世界空间；通过脚部接触概率的融合规则将 v_f 与 v_e 结合
系统使用 SMPL 骨架，在前置测量的腿长或默认为 SMPL 的均值，并从 DIP-IMU、TotalCapture 和 AMASS 合成带有噪声和增强的数据来进行训练

实验结果

研究问题

RQ1是否能够在没有环境约束的情况下，仅使用六个 IMUs 实现高帧率的实时全运动捕捉，包括全局平移？
RQ2多阶段姿态估计方法通过中间关节点位置表示，是否比直接从 IMU 数据回归姿态更高精度和更高效率？
RQ3能否通过融合基于脚部着地接触和学习得到的根节点速度的平移估计策略，在多样化动作中鲁棒地估计全局运动？
RQ4合成数据和运动历史建模如何影响跨 DIP-IMU、TotalCapture 和 AMASS 等数据集的泛化能力？

主要发现

该方法仅使用 6 IMUs，在超过 90 fps 的情况下实现带全球平移估计的实时运动捕捉
三阶段姿态估计设计（叶子关节点 → 所有关节 → 旋转）在精度和计算量方面优于直接旋转预测
结合基于足部着地的速度和根节点速度回归的混合平移估计在步行、奔跑和跳跃等场景中提高鲁棒性
该方法在公开数据集的定性和定量评估中优于 DIP 和 SIP 等前人工作，具有更高的精度和效率
系统保持纯惯性，不依赖视觉 Mocap 固有的遮挡和环境限制

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。