QUICK REVIEW

[논문 리뷰] Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild

Akash Sengupta, Ignas Budvytis|arXiv (Cornell University)|2020. 09. 21.

Human Pose and Action Recognition인용 수 44

한 줄 요약

STRAPS는 SMPL 본체 모델과 함께 실시간으로 생성되는 합성 학습 데이터를 사용해 단일 RGB 이미지에서 강건한 3D 인간 포즈와 형태를 학습하고, 평가를 위해 SSP-3D in-the-wild 데이터셋을 도입합니다. 이 방법은 최첨단과 비교해 형태 정확도 향상과 포즈 성능의 경쟁력을 보입니다.

ABSTRACT

This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image. Despite great progress in this field in terms of pose prediction accuracy, state-of-the-art methods often predict inaccurate body shapes. We suggest that this is primarily due to the scarcity of in-the-wild training data with diverse and accurate body shape labels. Thus, we propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity. We bridge the gap between synthetic training inputs and noisy real inputs, which are predicted by keypoint detection and segmentation CNNs at test-time, by using data augmentation and corruption during training. In order to evaluate our approach, we curate and provide a challenging evaluation dataset for monocular human shape estimation, Sports Shape and Pose 3D (SSP-3D). It consists of RGB images of tightly-clothed sports-persons with a variety of body shapes and corresponding pseudo-ground-truth SMPL shape and pose parameters, obtained via multi-frame optimisation. We show that STRAPS outperforms other state-of-the-art methods on SSP-3D in terms of shape prediction accuracy, while remaining competitive with the state-of-the-art on pose-centric datasets and metrics.

연구 동기 및 목표

단안(monocular) 3D 인간 포즈/형태 추정 데이터셋에서 신체 형태 다양성의 부족을 해결한다.
프록시 입력으로부터 SMPL 형태와 포즈를 회귀하는 합성 학습 프레임워크(STRAPS)를 제안한다.
증강을 통해 노이즈가 있는 실제 입력에 대한 강건성을 입증하고, 야생에서의 형태 예측을 향상시킨다.

제안 방법

일반적으로 사용 가능한 탐지기를 사용하여 RGB로부터 프록시 표현(실루엣 및 2D 관절)을 예측한다.
합성 즉시(on-the-fly) 데이터로 프록시 표현을 SMPL 형태 및 포즈 파라미터로 매핑하는 회귀 네트워크를 학습한다.
SMPL 형태와 포즈를 샘플링하고 실루엣 및 2D 관절을 렌더링하며 형태 증강을 적용하여 다양성을 증가시키는 합성 입력을 생성한다.
합성-실제 간의 차이를 줄이기 위해 노이즈, 가려짐, 탐지/세분화 오류로 프록시 입력을 증강한다.
동분산성 불확실성(homoscedastic uncertainty)을 통한 적응 가중치를 갖는 다중 작업 손실을 사용하여 SMPL 파라미터, 3D 관절, 3D 정점, 2D 관절을 감독한다.
모양 중심의 SSP-3D와 포즈 중심 데이터셋(Human3.6M, 3DPW, MoVi)를 사용하여 형태 및 포즈 정확도를 벤치마크한다.

실험 결과

연구 질문

RQ1합성 즉시 데이터 생성을 SMPL과 간단한 프록시 입력으로 수행하면 야생에서의 형태 다양성과 예측 정확도가 향상될 수 있는가?
RQ2프록시 입력에 노이즈와 가려짐을 추가하면 합성 데이터와 실제 테스트 입력 간의 차이를 줄일 수 있는가?
RQ3STRAPS가 다양한 야생 형태 데이터셋(SSP-3D)에서 형태 및 포즈 지표 측면으로 최첨단 방법과 비교해 어떤 성능을 보이는가?

주요 결과

STRAPS는 SSP-3D에서 형태 예측 정확도가 향상되며 PVE-T-SC 및 mIOU에서 최첨단을 능가한다.
이 방법은 포즈 중심 데이터셋(Human3.6M, 3DPW)에서 MPJPE-PA 등 포즈 측면에서도 최첨단과 경쟁력을 유지한다.
형태 증강은 예측되는 신체 형태의 다양성을 높이고, 프록시 표현 증강과 결합될 때 비전형적 대상의 성능을 향상시킨다.
프록시 표현 증강(실루엣 + 2D 관절에 노이즈/가려짐)을 통해 합성에서 실제 입력으로의 성능 저하를 감소시킨다.
프록시 표현 우선-SMPL 회귀의 2단계 접근은 3D 라벨이 있는 실제 학습 데이터 없이도 강력한 3D 감독을 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.