QUICK REVIEW

[論文レビュー] FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Yu Rong, Takaaki Shiratori|arXiv (Cornell University)|Aug 19, 2020

Human Pose and Action Recognition参考文献 58被引用数 54

ひとこと要約

FrankMocap は、単眼 RGB 入力から3D の手と全身のポーズを共同推定する。個別の手モジュールと胴体モジュールを用いて統一された SMPL-X モデルへ統合し、ほぼリアルタイム性能（約9.5 fps）と最先端の手ポーズ精度を実現する。

ABSTRACT

Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. In this paper, we present FrankMocap, a motion capture system that can estimate both 3D hand and body motion from in-the-wild monocular inputs with faster speed (9.5 fps) and better accuracy than previous work. Our method works in near real-time (9.5 fps) and produces 3D body and hand motion capture outputs as a unified parametric model structure. Our method aims to capture 3D body and hand motion simultaneously from challenging in-the-wild monocular videos. To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be efficiently integrated to monocular body motion capture output, producing whole body motion results in a unified parrametric model structure. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenes, including a live demo scenario.

研究の動機と目的

単眼 RGB 入力からの共用の3D 手と胴体のモーションキャプチャを、野外環境で可能にし、実現する。
統一された SMPL-X 表現へ入力できる、迅速で互換性のある手と胴体回帰モジュールを開発する。
全身出力を改善するための、シンプルな copy-and-paste 統合と最適化ベースの Refinement を提供する。

提案手法

2つの回帰モジュールが、単一の RGB 画像から別々に 3D の胴体と手のポーズを予測する。
Hand モジュールは、SMPL-X の hand コンポーネントを単独モデルとして用い、手パラメータ [ϕ_h, θ_h, β_h] と weak-perspective カメラ c_h を回帰する。
Body モジュールは、方位 φ_b およびカメラ c_b とともに SMPL-X 空間で、胴体および四肢のポーズパラメータ θ_b と形状 β_b を回帰する。
統合モジュールは、手と胴体の出力を copy-and-paste または 2D キーポイントも活用する最適化ベースのフィッティングを通じて、統一された SMPL-X 表現へ統合する。
最適化目的 F = F^2D + F^pri は、3D ジョイントを 2D 観測と一致させ、ポーズ/形状の妥当性を強制する。
トレーニングは、FreiHAND、HO-3D、MTC、STB、RHD、MPII+NZSL を含む複数データセットの手データを用い、手モデルと互換性を持つようデータのリスケーリングと再順序付けを行う。モーションブラー拡張は頑健性を向上させる。

実験結果

リサーチクエスチョン

RQ1単眼 RGB 入力が、SMPL-X 表現の下で同時に正確な 3D 手と全身のポーズを生み出せるか。
RQ2個別の手および胴体回帰モジュールは、整合の取れた全身モデルへ容易に統合できる出力を生み出すか。
RQ3高速な copy-and-paste 統合や Refinement ベースの最適化は、野外環境で正確かつ安定した全身モーションキャプチャをもたらすか。

主な発見

Method	Preprocess (fps)	Model (fps)	Overall (fps)
SMPLify-X (CP)	7.5	0.01	0.01
MTC (CP)	7.5	0.1	0.1
Online (CP)	35	13	9.5
Offline (OP)	7.5	13	4.7
Offline (Another)	7.5	1.1	0.95

FrankMocap は online (copy-and-paste) モードで約 9.5 fps を達成し、単眼動画からのほぼリアルタイム全身モーションキャプチャを実現する。
3D 手のポーズ推定は公開ベンチマークで最先端の性能を達成する。
SMPL-X を介した手と胴体の出力の統合は、重い最適化なしで整合の取れた全身表現を可能にし、手首の整列と 2D キーポイントの一貫性を改善する任意の最適化段階を提供する。
多様なトレーニングデータセットとモーションブラー拡張が、野外環境での頑健性に有益であることを広範なアブレーションで示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。