QUICK REVIEW

[논문 리뷰] FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Yu Rong, Takaaki Shiratori|arXiv (Cornell University)|2020. 08. 19.

Human Pose and Action Recognition참고 문헌 58인용 수 54

한 줄 요약

FrankMocap은 단안 RGB 입력에서 3D 손 및 전신 자세를 공동으로 회귀하며, 서로 독립적인 손 모듈과 바디 모듈이 통합되어 통합된 SMPL-X 모델을 구성하고, 거의 실시간에 가까운 성능(~9.5 fps)과 최첨단 손 자세 정확도를 달성한다.

ABSTRACT

Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. In this paper, we present FrankMocap, a motion capture system that can estimate both 3D hand and body motion from in-the-wild monocular inputs with faster speed (9.5 fps) and better accuracy than previous work. Our method works in near real-time (9.5 fps) and produces 3D body and hand motion capture outputs as a unified parametric model structure. Our method aims to capture 3D body and hand motion simultaneously from challenging in-the-wild monocular videos. To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be efficiently integrated to monocular body motion capture output, producing whole body motion results in a unified parrametric model structure. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenes, including a live demo scenario.

연구 동기 및 목표

단안 RGB 입력에서 야생 환경에서의 합동 3D 손 및 신체 동작 캡처를 동기 부여하고 가능하게 한다.
빠르고 호환 가능한 손 및 바디 회귀 모듈을 개발하여 통합된 SMPL-X 표현으로 피드한다.
간단한 copy-and-paste 통합과 전체 신체 출력 향상을 위한 최적화 기반 보정 방법을 제공한다.

제안 방법

두 개의 회귀 모듈이 하나의 RGB 이미지에서 3D 바디 포즈와 핸드 포즈를 각각 예측한다.
손 모듈은 SMPL-X 손 부품을 독립적인 모델로 사용하여 손 매개변수 [ϕ_h, θ_h, β_h]와 약한 원근 카메라 c_h를 회귀한다.
바디 모듈은 SMPL-X 공간에서 자세 θ_b와 형태 β_b를 회귀하고 방향 φ_b와 카메라 c_b를 사용한다.
통합 모듈은 손과 바디 출력을 copy-and-paste 방식 또는 2D 키포인트를 활용한 최적화 기반 피팅을 통해 단일 SMPL-X 표현으로 결합한다.
최적화 목표 F = F^2D + F^pri가 3D 관절을 2D 관찰과 일치시키고 자세/모양의 타당성을 강제한다.
학습은 다중 데이터셋 손 데이터를 활용한다(FreiHAND, HO-3D, MTC, STB, RHD, MPII+NZSL) 손 모델과의 호환성을 위해 데이터 재스케일링 및 재배치를 수행한다; 모션 블러 증강은 로버스트함을 향상시킨다.

실험 결과

연구 질문

RQ1단안 RGB 입력이 SMPL-X 표현 하에서 3D 손과 신체 포즈를 동시에 정확하게 산출할 수 있는가?
RQ2독립된 손 및 바디 회귀 모듈이 일관된 전신 모델로 쉽게 통합될 수 있는 출력을 생성하는가?
RQ3빠른 copy-and-paste 통합이나 보정 기반 최적화가 야생 환경에서 정확하고 안정적인 전신 모션 캡처를 얻는가?

주요 결과

Method	Preprocess (fps)	Model (fps)	Overall (fps)
SMPLify-X (CP)	7.5	0.01	0.01
MTC (CP)	7.5	0.1	0.1
Online (CP)	35	13	9.5
Offline (OP)	7.5	13	4.7
Offline (Another)	7.5	1.1	0.95

FrankMocap은 온라인(복사-붙여넣기) 모드에서 약 9.5 fps를 달성하여 단안 비디오로부터 거의 실시간에 가까운 전신 모션 캡처를 가능하게 한다.
3D 손 자세 추정이 공개 벤치마크에서 최첨단 성능을 달성한다.
SMPL-X를 통한 손과 바디 출력의 통합은 heavy optimization 없이도 일관된 전신 표현을 가능하게 하며, 손목 정렬 및 2D 키포인트 일관성을 개선하는 선택적 최적화 단계를 포함한다.
광범위한 비교 실험은 다양한 학습 데이터 세트와 모션 블러 증강이 야생에서의 강건성에 이점을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.