QUICK REVIEW

[Paper Review] Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic

Yongyi Tang, Lin Ma|arXiv (Cornell University)|May 7, 2018

Human Pose and Action Recognition4 references41 citations

TL;DR

The paper introduces a motion-context aware prediction framework with a modified highway unit (MHU) and a gram-matrix loss to improve long-term 3D human motion prediction and enable motion transfer conditioned on activity labels.

ABSTRACT

Human motion prediction aims at generating future frames of human motion based on an observed sequence of skeletons. Recent methods employ the latest hidden states of a recurrent neural network (RNN) to encode the historical skeletons, which can only address short-term prediction. In this work, we propose a motion context modeling by summarizing the historical human motion with respect to the current prediction. A modified highway unit (MHU) is proposed for efficiently eliminating motionless joints and estimating next pose given the motion context. Furthermore, we enhance the motion dynamic by minimizing the gram matrix loss for long-term motion prediction. Experimental results show that the proposed model can promisingly forecast the human future movements, which yields superior performances over related state-of-the-art approaches. Moreover, specifying the motion context with the activity labels enables our model to perform human motion transfer.

Motivation & Objective

Motivate accurate long-term human motion prediction beyond short-term accuracy.
Model motion context to summarize historical motion with respect to the current prediction.
Develop a Modified Highway Unit to selectively update motion-bearing joints.
Introduce a gram-matrix loss to encourage temporal-spatial motion dynamics.
Demonstrate motion transfer by conditioning on activity labels.

Proposed method

Embed historical skeletons into a semantic space via a skeleton embedding layer.
Compute a motion context by temporally attending over past embeddings with respect to the last predicted frame.
Predict future skeletons using a Modified Highway Unit that gates updates to motion-bearing joints based on motion context and current input.
Optimize predictions with a gram-matrix loss to capture temporal dynamics and inter-joint correlations.
Optionally perform motion transfer by supplying activity labels to modulate the motion context.

Experimental results

Research questions

RQ1Can motion context modeled from the entire historical sequence improve long-term prediction compared to using only the last hidden state?
RQ2Does the Modified Highway Unit effectively filter motionless joints to focus on informative joints for prediction?
RQ3Does a gram-matrix based objective better capture temporal dynamics and spatial correlations than standard MSE loss?
RQ4Can the model support motion transfer by conditioning on activity labels during prediction?

Key findings

The proposed approach outperforms state-of-the-art methods on long-term prediction on the H3.6m dataset.
Motion context modeling provides more robust long-term predictions than using only the last hidden state.
The gram-matrix loss enhances temporal dynamics and reduces mean pose convergence, improving long-term results.
The Modified Highway Unit effectively gates updates, emphasizing dynamically moving joints.
The model supports motion transfer by conditioning on activity labels, enabling smooth transitions between activities.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.