QUICK REVIEW

[논문 리뷰] TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors

Xinyu Yi, Yuxiao Zhou|arXiv (Cornell University)|2021. 05. 10.

Human Pose and Action Recognition참고 문헌 44인용 수 52

한 줄 요약

TransPose는 six IMUs를 사용하여 다단계 자세 파이프라인과 융합 기반의 전역 변환 추정기를 통해 90 fps를 넘어서는 실시간 3D 인간 자세 추정 및 전역 변환을 달성합니다.

ABSTRACT

Motion capture is facing some new possibilities brought by the inertial sensing technologies which do not suffer from occlusion or wide-range recordings as vision-based solutions do. However, as the recorded signals are sparse and quite noisy, online performance and global translation estimation turn out to be two key difficulties. In this paper, we present TransPose, a DNN-based approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps. For body pose estimation, we propose a multi-stage network that estimates leaf-to-full joint positions as intermediate results. This design makes the pose estimation much easier, and thus achieves both better accuracy and lower computation cost. For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique. Quantitative and qualitative comparisons show that our method outperforms the state-of-the-art learning- and optimization-based methods with a large margin in both accuracy and efficiency. As a purely inertial sensor-based approach, our method is not limited by environmental settings (e.g., fixed cameras), making the capture free from common difficulties such as wide-range motion space and strong occlusion.

연구 동기 및 목표

시계열 정보와 포즈 프라이어를 활용하여 단 6개의 IMU만으로 전체 모션 캡처의 제약이 큰 문제를 해결한다.
카메라나 외부 센서 없이도 신체 자세와 전역 변환의 실시간(90 fps 이상) 추정을 가능하게 한다.
포즈 추정을 중간 관절 위치 작업으로 분해하여 이전 DIP/SIP 방법들보다 정확도와 효율성을 향상시킨다.
희박한 관성 데이터로 실시간 전역 변환을 추정하는 강인하고 융합 기반의 접근법을 제안한다.

제안 방법

먼저 Leaf 관절 위치를 예측하는 Pose-S1, 그다음 모든 관절 위치를 완성하는 Pose-S2, 그리고 마지막으로 LSTM 셀을 갖춘 양방향 RNN을 사용하여 관절 회전을 회귀하는 세 단계 포즈 추정 파이프라인.
Leaf 관절은 인간의 운동학적 계층 구조와 시계열 정보를 활용하기 위한 중간 표현으로 사용된다.
Global translation estimation is done via two parallel branches: a foot-ground contact-based velocity estimate (Trans-B1) and a root-velocity RNN (Trans-B2), fused according to foot contact probability.
The foot-ground contact network uses leaf joint positions and IMU data to infer which foot is on the ground and computes root velocity from forward kinematics of the supporting foot.
Trans-B2 predicts the root velocity in the root’s coordinate frame with an RNN, then converts to world space using the root rotation; a fusion rule combines v_f and v_e based on foot contact probability.
The system uses the SMPL skeleton, with leg lengths measured in advance or defaulted to the mean SMPL, and synthesizes training data from DIP-IMU, TotalCapture, and AMASS with noise and augmentation.

실험 결과

연구 질문

RQ1Can real-time full-motion capture, including global translation, be achieved from only six IMUs at high frame rates without environmental constraints?
RQ2Does a multi-stage pose estimation approach with intermediate joint-position representations improve accuracy and efficiency over direct pose regression from IMU data?
RQ3Can a fusion-based translation estimation strategy leveraging foot-ground contact and learned root velocity robustly estimate global movement in diverse motions?
RQ4How do synthetic data and motion history modeling impact generalization across datasets like DIP-IMU, TotalCapture, and AMASS?

주요 결과

The approach achieves real-time motion capture with global translation estimation using only 6 IMUs at over 90 fps.
A three-stage pose estimation design (leaf joints → all joints → rotations) yields higher accuracy and lower computation than direct rotation prediction.
A hybrid translation estimator combining foot-ground contact-based velocity and root-velocity regression improves robustness across walking, running, and jumping.
The method outperforms prior works DIP and SIP in both qualitative and quantitative evaluations on public datasets, with improved accuracy and efficiency.
The system remains purely inertial, avoiding occlusion and environmental limitations inherent to vision-based mocap.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.