QUICK REVIEW

[논문 리뷰] Towards Geometry-Aware and Motion-Guided Video Human Mesh Recovery

Hongjun Chen, Huan Zheng|arXiv (Cornell University)|2026. 01. 29.

Human Pose and Action Recognition인용 수 0

한 줄 요약

HMRMamba는 STA-Mamba와 Structured State Space Models를 이용한 모션 가이드 재구성 네트워크로 시계열 일관성과 효율성을 개선하여 최첨단의 비디오 기반 3D 인간 메시 회복을 달성합니다.

ABSTRACT

Existing video-based 3D Human Mesh Recovery (HMR) methods often produce physically implausible results, stemming from their reliance on flawed intermediate 3D pose anchors and their inability to effectively model complex spatiotemporal dynamics. To overcome these deep-rooted architectural problems, we introduce HMRMamba, a new paradigm for HMR that pioneers the use of Structured State Space Models (SSMs) for their efficiency and long-range modeling prowess. Our framework is distinguished by two core contributions. First, the Geometry-Aware Lifting Module, featuring a novel dual-scan Mamba architecture, creates a robust foundation for reconstruction. It directly grounds the 2D-to-3D pose lifting process with geometric cues from image features, producing a highly reliable 3D pose sequence that serves as a stable anchor. Second, the Motion-guided Reconstruction Network leverages this anchor to explicitly process kinematic patterns over time. By injecting this crucial temporal awareness, it significantly enhances the final mesh's coherence and robustness, particularly under occlusion and motion blur. Comprehensive evaluations on 3DPW, MPI-INF-3DHP, and Human3.6M benchmarks confirm that HMRMamba sets a new state-of-the-art, outperforming existing methods in both reconstruction accuracy and temporal consistency while offering superior computational efficiency.

연구 동기 및 목표

모노큘러 비디오에서 신뢰할 수 없는 3D 포즈 앵커를 해결하여 강력한 3D 인간 메시 회복을 촉진합니다.
시계열적 진화와 기하학적 단서를 활용해 안정적이고 해부학적으로 타당한 3D 포즈를 생성합니다.
가림 및 모션 블러 하에서 시계열 일관성과 강인성을 향상시킵니다.
효율적이고 확장 가능한 구조를 통해 최첨단 성능을 달성합니다.

제안 방법

Structured State Space Models (SSMs)을 활용한 2단계 파이프라인인 HMRMamba를 제안합니다.
이미지 특징과 2D 포즈를 기하학적으로 근거 있는 3D 포즈 시퀀스(P3D)로 융합하는 Dual-Scan Mamba를 이용한 기하학적 의식 리프팅 모듈 STA-Mamba.
전 포즈 시퀀스를 사용해 모션 인식에 따라 메시 회복을 안내하는 모션 가이드 재구성 네트워크.
리프팅 헤드는 시간적으로 풍부한 특징에서 3D 관절을 회귀합니다.
손실은 포즈에 대해 MPJPE, 시간적 일관성, MPJVE, 2D 재투영을 결합하고 메시 회복을 위해 메시, 관절, 법선, 에지 손실을 사용합니다.

실험 결과

연구 질문

RQ1STA-Mamba가 2D 포즈와 이미지 특징으로 기하학적으로 근거 있는 3D 포즈 앵커를 생성할 수 있는가?
RQ2모션 가이드 재구성을 통해 전체 시계열 다이나믹스를 도입하는 것이 메시 품질과 시계열 일관성을 개선하는가?
RQ3STA-Mamba 기반 리프팅은 가림 및 모션 블러 하에서 기존의 2D-3D 리프팅 approached에 비해 어떤 차이가 있는가?
RQ4HMR에 Mamba 기반 아키텍처를 사용할 때 정확도와 효율성 간의 트레이드오프는 어떻게 되는가?
RQ5일반 벤치마크(3DPW, MPI-INF-3DHP, Human3.6M) 간에 추가 priors 없이 일반화가 가능한가?

주요 결과

HMRMamba는 MPJPE, PA-MPJPE, MPVPE, Accel 지표에서 3DPW, MPI-INF-3DHP, Human3.6M에서 최첨단 혹은 경쟁력 있는 결과를 달성합니다.
STA-Mamba는 프레임별 정확도와 시계열 일관성을 향상시키는 기하학적으로 근거 있는 3D 포즈 앵커를 제공합니다.
모션 가이드 재구성 네트워크는 시계열 일관성과 가림 및 모션 블러에 대한 강인성을 향상시킵니다.
파라미터 효율이 높은(79.63M 파라미터) 모델로, 이전의 Transformer 기반 접근법에 비해 계산 효율이 뛰어납니다(7.88 GFLOPs).
엘리베이션에 대한 GA, 명시적/암시적 모션 표현, 듀얼-스캔 Mamba 입력이 최고 성능에 결정적임을 보여주는 애버레이션 결과.
Human3.6M, 3DPW, MPI-INF-3DHP에서 우리의 변형들이 기존 SOTA 방법을 능가하며 강한 일반화를 보임

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.