Skip to main content
QUICK REVIEW

[Paper Review] Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

Anusha Nagabandi, Ignasi Clavera|arXiv (Cornell University)|Mar 30, 2018
Robotic Locomotion and Control300 citations
TL;DR

The paper presents GrBAL and ReBAL, meta-learning-based online adaptation for model-based RL to enable fast, sample-efficient adaptation to dynamic, real-world environments, including a real legged millirobot.

ABSTRACT

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations or unseen situations cause proficient but specialized policies to fail at test time. Given that it is impractical to train separate policies to accommodate all situations the agent may see in the real world, this work proposes to learn how to quickly and effectively adapt online to new tasks. To enable sample-efficient learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach uses meta-learning to train a dynamics model prior such that, when combined with recent data, this prior can be rapidly adapted to the local context. Our experiments demonstrate online adaptation for continuous control tasks on both simulated and real-world agents. We first show simulated agents adapting their behavior online to novel terrains, crippled body parts, and highly-dynamic environments. We also illustrate the importance of incorporating online adaptation into autonomous agents that operate in the real world by applying our method to a real dynamic legged millirobot. We demonstrate the agent's learned ability to quickly adapt online to a missing leg, adjust to novel terrains and slopes, account for miscalibration or errors in pose estimation, and compensate for pulling payloads.

Motivation & Objective

  • Motivate the need for rapid online adaptation in real-world RL where dynamics change due to perturbations or new terrains.
  • Develop a sample-efficient meta-learning framework that adapts a dynamics model online using recent experience.
  • Propose two instantiations, GrBAL (gradient-based) and ReBAL (recurrence-based), for online adaptation of neural dynamics models.
  • Evaluate on simulated continuous control tasks with dynamic perturbations and on a real legged millirobot to demonstrate practical applicability.

Proposed method

  • Model-based RL with a neural dynamics model that is rapidly adaptable using meta-learning.
  • Meta-training optimizes a base model parameter set and an update mechanism so past experience informs fast adaptation.
  • Two update mechanisms: GrBAL uses gradient-based updates akin to MAML; ReBAL uses a recurrent network to learn its own update rule.
  • Adaptation uses the past M time steps to predict the next K steps, updating parameters to minimize negative log-likelihood.
  • Planning with MPPI (model predictive path integral control) using the adapted model, with re-planning at each timestep.
  • Training and testing workflows include online adaptation during meta-training to provide on-policy data.

Experimental results

Research questions

  • RQ1Can the adapted dynamics model change with online adaptation to improve prediction of near-future dynamics?
  • RQ2Do GrBAL and ReBAL enable fast online adaptation to drastic dynamics changes and unseen environments?
  • RQ3How does model-based meta-RL compare to model-free meta-RL and baseline MB methods in sample efficiency and performance?
  • RQ4Which of GrBAL or ReBAL provides better generalization and fast adaptation in varied tasks?
  • RQ5Is online adaptation feasible and beneficial on a real robot?

Key findings

  • Adaptation reduces model prediction error from pre-update to post-update, demonstrating effective online adaptation.
  • Meta-training of GrBAL/ReBAL with 1.5-3 hours of real-world data yields superior or equivalent performance to model-free agents trained with ≈1000× more data.
  • GrBAL outperforms MB+DE and MB oracle in scenarios requiring fast adaptation across several tasks.
  • In real-robot experiments, GrBAL demonstrates online adaptation to terrain changes, miscalibration, and payloads on a legged millirobot.
  • GrBAL (generally) achieves better fast adaptation and generalization than ReBAL across tested environments.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.