QUICK REVIEW

[Paper Review] Neural probabilistic motor primitives for humanoid control

Josh Merel, Leonard Hasenclever|arXiv (Cornell University)|Nov 28, 2018

Motor Control and Adaptation89 citations

TL;DR

The paper introduces neural probabilistic motor primitives, an offline-trained motor module that compresses thousands of expert humanoid skills into a latent space, enabling one-shot imitation and reuse by higher-level controllers. It compares behavioral cloning and linear-feedback policy cloning (LFPC) for offline transfer.

ABSTRACT

We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids. To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck. We show that it is possible to train this model entirely offline to compress thousands of expert policies and learn a motor primitive embedding space. The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories. Additionally, we demonstrate that it is also straightforward to train controllers to reuse the learned motor primitive space to solve tasks, and the resulting movements are relatively naturalistic. To support the training of our model, we compare two approaches for offline policy cloning, including an experience efficient method which we call linear feedback policy cloning. We encourage readers to view a supplementary video ( https://youtu.be/CaDEf-QcKwA ) summarizing our results.

Motivation & Objective

Develop a motor primitive module that can represent and generate a large set of humanoid motor skills.
Enable one-shot imitation and flexible reuse of skills within a compact embedding space.
Avoid extensive online RL by leveraging offline policy transfer from expert demonstrations.
Compare two offline transfer methods: behavioral cloning and linear-feedback policy cloning (LFPC).
Demonstrate robustness, naturalness, and transferability of learned primitives across tasks and unseen trajectories.

Proposed method

Propose an autoregressive latent-variable model with a latent z_t at each time step that conditions the action distribution p(a_t|s_t,z_t).
Encode short-lookahead trajectory snippets x_t to train the encoder q(z_t|z_{t-1},x_t) and decoder π(a_t|s_t,z_t).
Use an AR(1) prior on z_t to encourage temporal coherence and compress information via a beta-weighted ELBO objective.
Train offline from expert trajectories (2707 clips) via supervised learning, enabling one-shot imitation without online RL.
Introduce two offline transfer schemes: (a) behavioral cloning from noisy expert rollouts, and (b) linear-feedback policy cloning (LFPC) using action-state Jacobians for robust nearby states.
Adapt the objective for LFPC by incorporating perturbations and Jacobian-based corrections in the likelihood and KL terms.

Experimental results

Research questions

RQ1Can a single neural probabilistic motor primitive module compress thousands of expert humanoid skills into a usable embedding space?
RQ2Is it possible to achieve one-shot imitation and robust reproduction of unseen trajectories using offline-trained primitives?
RQ3How do behavioral cloning and LFPC compare for offline transfer in terms of data efficiency and performance?
RQ4Can learned primitives be reused by higher-level controllers to solve new tasks with naturalistic movement?
RQ5How does latent space structure affect robustness to perturbations and generalization to unseen behaviors?

Key findings

The motor primitive module can compress thousands of expert policies into a learned embedding space.
One-shot imitation using LFPC with a single trajectory can match behavioral cloning that uses hundreds of trajectories, under certain regularization settings.
Regularization and a larger latent space improve imitation performance and robustness.
The learned primitive space enables reuse by high-level policies to solve sparse-reward tasks with human-like motion.
Optimization of latent sequences can improve one-shot imitation for border-line trajectories, indicating a meaningful latent representation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.