QUICK REVIEW

[論文レビュー] TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors

Xinyu Yi, Yuxiao Zhou|arXiv (Cornell University)|May 10, 2021

Human Pose and Action Recognition参考文献 44被引用数 52

ひとこと要約

TransPose は六つの IMU を用いて、マルチステージのポーズパイプラインと融合ベースのグローバルトランスレーション推定により、リアルタイムの3D人間ポーズ推定とグローバル翻訳を90 fpsを超えて達成します。

ABSTRACT

Motion capture is facing some new possibilities brought by the inertial sensing technologies which do not suffer from occlusion or wide-range recordings as vision-based solutions do. However, as the recorded signals are sparse and quite noisy, online performance and global translation estimation turn out to be two key difficulties. In this paper, we present TransPose, a DNN-based approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps. For body pose estimation, we propose a multi-stage network that estimates leaf-to-full joint positions as intermediate results. This design makes the pose estimation much easier, and thus achieves both better accuracy and lower computation cost. For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique. Quantitative and qualitative comparisons show that our method outperforms the state-of-the-art learning- and optimization-based methods with a large margin in both accuracy and efficiency. As a purely inertial sensor-based approach, our method is not limited by environmental settings (e.g., fixed cameras), making the capture free from common difficulties such as wide-range motion space and strong occlusion.

研究の動機と目的

時間情報とポーズ事前情報を活用して、わずか六つの IMU のみからの全運動捕捉という制約の多い問題に対処する。
カメラや外部センサを使わず、体ポーズとグローバル翻訳のリアルタイム（90 fps 以上）推定を可能にする。
ポーズ推定を中間の関節位置タスクに分解することで、従来の DIP/SIP 手法より精度と効率を向上させる。
希薄な慣性データからリアルタイムにグローバル翻訳を推定する堅牢な融合ベースのアプローチを提案する。

提案手法

まず葉状関節位置を予測する三段階のポーズ推定パイプライン（Pose-S1）、次に全ての関節位置を完成させる（Pose-S2）、最後に関節回転を回帰する（Pose-S3）を、LSTMセルを用いた双方向 RNNs で実行する。
葉状関節は人間の運動学的階層と時間情報を活用する中間表現として用いられる。
グローバル翻訳推定は、足接地接触ベースの速度推定（Trans-B1）と根の速度 RNN（Trans-B2）の2つの平行分枝を用い、足接触確率に基づいて融合する。
足接地接触ネットワークは葉状関節位置と IMU データを用いてどちらの足が地面についているかを推定し、支持脚の順運動学から根の速度を計算する。
Trans-B2 はルート座標系で根の速度を RNN で予測し、根回転を用いてワールド空間へ変換する。融合則は足接触確率に基づいて v_f と v_e を結合する。
システムはSMPLスケルトンを使用し、脚の長さは事前に測定するか平均SMPLにデフォルト化し、DIP-IMU、TotalCapture、AMASS からノイズと拡張を加えて学習データを合成する。

実験結果

リサーチクエスチョン

RQ1環境制約なしに、六つの IMU のみから、グローバル翻訳を含むリアルタイムの全運動捕捉を高フレームレートで実現できるか？
RQ2中間の関節位置表現を用いるマルチステージのポーズ推定アプローチは、IMU データからの直接的なポーズ回帰より精度と効率を向上させるか？
RQ3足接地接触と学習された根速度を活用する融合ベースの翻訳推定戦略は、多様な動作でグローバル動作を頑健に推定できるか？
RQ4合成データとモーション履歴モデリングが、DIP-IMU、TotalCapture、AMASS のようなデータセット間の一般化にどのような影響を与えるか？

主な発見

本手法は、6 IMU のみを用いてグローバル翻訳推定を含むリアルタイムモーションキャプチャを 90 fps を超えて実現する。
3 段階のポーズ推定設計（葉状関節 → 全関節 → 回転）は、直接回転予測より高精度で計算量が少ない。
足接地接触ベースの速度と根速度回帰を組み合わせたハイブリッド翻訳推定器は、歩行・走行・ジャンプなどの多様な動作で頑健性を向上させる。
本手法は、公開データセットでの定性的・定量的評価の双方で、従来手法 DIP および SIP を上回り、精度と効率を向上させた。
本システムは純粋な慣性のみで動作し、視覚ベースの mocap に内在する遮蔽や環境制約を回避する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。