QUICK REVIEW

[논문 리뷰] Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

Xipeng Chen, Kwan-Yee Lin|arXiv (Cornell University)|2019. 03. 21.

Human Pose and Action Recognition참고 문헌 43인용 수 42

한 줄 요약

이 논문은 다중 시점 2D 스켈레톤으로부터 기하학적으로 인식된 3D 포즈 표현을 학습하는 약한 지도 학습 프레임워크를 제안한다. 스켈레톤 기반 뷰 합성 인코더-디코더와 표현 일관성 제약을 통해 단안(모노큘러) 3D 포즈 추정을 개선한다.

ABSTRACT

Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures. However, the generalizability to different environments remains an elusive goal. In this work, we propose a geometry-aware 3D representation for the human pose to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision. A view synthesis framework is proposed to learn the shared 3D representation between viewpoints with synthesizing the human pose from one viewpoint to the other one. Instead of performing a direct transfer in the raw image-level, we propose a skeleton-based encoder-decoder mechanism to distil only pose-related representation in the latent space. A learning-based representation consistency constraint is further introduced to facilitate the robustness of latent 3D representation. Since the learnt representation encodes 3D geometry information, mapping it to 3D pose will be much easier than conventional frameworks that use an image or 2D coordinates as the input of 3D pose estimator. We demonstrate our approach on the task of 3D human pose estimation. Comprehensive experiments on three popular benchmarks show that our model can significantly improve the performance of state-of-the-art methods with simply injecting the representation as a robust 3D prior.

연구 동기 및 목표

제한된 3D 주석으로도 일반화 가능한 기하학적으로 인식된 표현을 학습하여 다양한 환경과 동작에서 강건한 3D 포즈 추정을 목표로 한다.
2D 감독만으로 다중 시점 스켈레톤에서 공유되는 3D 포즈 표현을 학습한다.
포즈 관련 정보를 3D 포즈로 더 쉽게 매핑되는 잠재 공간으로 증류한다.
뷰 합성 및 잠재 공간 일관성 제약을 활용하여 일반화를 향상시킨다.

제안 방법

다중 시점 이미지에서 얻은 2D 스켈레톤 맵을 원시 이미지 대신 입력으로 사용한다.
스켈레톤 기반 인코더–디코더를 학습하여 소스 뷰 스켈레톤에서 대상 뷰 스켈레톤을 합성하고, 잠재 코드가 기하 G를 표현하도록 한다.
G를 의미론적으로 의미 있는 3D 포즈 표현으로 제약하기 위해 뷰 방향 간 표현 일관성 손실을 도입한다.
알려진 뷰 회전에 대해 잠재 공간 일관성을 강제하기 위해 양방향 인코더–디코더 구성을 도입한다.
학습된 기하 표현 G를 3D 포즈 회귀기에 선험(prior)로 주입하여 G에서 3D 관절 좌표로의 간단한 회귀를 가능하게 한다.

실험 결과

연구 질문

RQ1다중 시점 데이터에서 2D 주석만으로도 인간 포즈에 대한 기하학적으로 인식된 3D 표현을 학습할 수 있는가?
RQ2스켈레톤 기반 뷰 합성 프레임워크와 잠재 공간 일관성 제약이 결합될 때 단안 포즈 추정을 개선하는 강건한 3D 포즈 표현을 생성하는가?
RQ3학습된 기하 표현이 다양한 데이터셋과 프로토콜에서 최첨단 3D 포즈 추정 방법을 향상시키는 효과적인 priors로 작용하는가?

주요 결과

스켈레톤 기반 뷰 합성 프레임워크가 G라는 기하 표현을 산출하고 이를 priors로 주입할 때 3D 포즈 추정 성능을 향상시킨다.
제한된 3D 주석일 때 G에서 3D 포즈를 간단한 2층 회귀기로 회귀해도 합리적인 결과를 얻을 수 있으며, G는 프로토콜 간에 더 강한 baselines를 향상시킨다.
표현 일관성 제약이 불합리한 포즈를 줄이고 G의 강건성을 높임을 입증하는 제거 실험(ablation)에서 제약이 포함될 때 성능이 개선된다.
가상 카메라를 통한 데이터 증강과 표현 일관성 제약을 함께 활용하면 베이스라인 대비 성능 향상이 관측된다.
학습된 G는 데이터셋 간 일반화되며, 실세계 MPII에 대한 질적 결과가 방법의 실용적 효과를 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.