QUICK REVIEW

[論文レビュー] Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild

Akash Sengupta, Ignas Budvytis|arXiv (Cornell University)|Sep 21, 2020

Human Pose and Action Recognition被引用数 44

ひとこと要約

STRAPS は SMPL ボディモデルを用いた即時合成トレーニングデータを利用して、単一の RGB 画像から頑健な 3D 人間のポーズと形状を学習し、評価用に SSP-3D in-the-wild データセットを導入します。本手法は、最先端と比較して形状の精度が向上し、ポーズの性能も競合的であることを示します。

ABSTRACT

This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image. Despite great progress in this field in terms of pose prediction accuracy, state-of-the-art methods often predict inaccurate body shapes. We suggest that this is primarily due to the scarcity of in-the-wild training data with diverse and accurate body shape labels. Thus, we propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity. We bridge the gap between synthetic training inputs and noisy real inputs, which are predicted by keypoint detection and segmentation CNNs at test-time, by using data augmentation and corruption during training. In order to evaluate our approach, we curate and provide a challenging evaluation dataset for monocular human shape estimation, Sports Shape and Pose 3D (SSP-3D). It consists of RGB images of tightly-clothed sports-persons with a variety of body shapes and corresponding pseudo-ground-truth SMPL shape and pose parameters, obtained via multi-frame optimisation. We show that STRAPS outperforms other state-of-the-art methods on SSP-3D in terms of shape prediction accuracy, while remaining competitive with the state-of-the-art on pose-centric datasets and metrics.

研究の動機と目的

単眼 3D 人体ポーズ/形状推定データセットにおける体形の多様性不足に対処する。
Proxy inputs から SMPL の形状とポーズを回帰する合成トレーニングフレームワーク STRAPS を提案する。
増強を通じてノイズのある実データ入力に対する頑健性を示し、野外での形状予測を改善する。

提案手法

RGB からオフ・ザ・スレット detectors を用いてプロキシ表現（シルエットと 2D ジョイント）を予測する。
合成の即時データでプロキシ表現を SMPL の形状とポーズパラメータに回帰する回帰ネットワークを訓練する。
SMPL の形状とポーズをサンプリングし、シルエットと 2D ジョイントをレンダリングし、形状増強を適用して多様性を増やす。
プロキシ入力をノイズ、遮蔽、検出/セグメンテーション誤差で増強し、合成データと実データのギャップを橋渡しする。
3D ジョイント、3D 頂点、そして 2D ジョイントを監督するために、ホモスケダスティックな不確実性を用いた適応的ウェイト付けの多タスク損失を用いる。
SSP-3D（形状重視）と姿勢中心データセット（Human3.6M、3DPW、MoVi）を用いて、形状と姿勢の精度をベンチマークする。

実験結果

リサーチクエスチョン

RQ1SMPL と単純なプロキシ入力を用いた合成の即時データ生成は、野外での形状の多様性と予測精度を向上させることができるか？
RQ2 proxy 入力をノイズや遮蔽で増強することは、合成データと実データのテスト時入力のギャップを埋めるか？
RQ3 STRAPS は SSP-3D のような多様な野外形状データセットで、形状とポーズの指標で最先端手法と比較してどのように性能を示すか？

主な発見

STRAPS は SSP-3D で形状予測精度を改善し、PVE-T-SC および mIOU で最先端を上回る。
この手法はポーズ中心データセットでも最先端と競合可能である（例：3DPW および Human3.6M の MPJPE-PA）。
形状の増強は予測される体形の多様性を高め、proxy 表現の増強と組み合わせると非典型的な被写体の性能を改善する。
proxy 表現の増強（シルエットとノイズ/遮蔽を付与した 2D ジョイント）により、合成データから実データへの移行時の性能低下を抑える。
プロキシ表現を経て SMPL 回帰を行う2段階アプローチは、3D ラベル付きの実データを必要とせず強力な 3D 監督を可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。