Skip to main content
QUICK REVIEW

[Paper Review] Wasserstein Robust Reinforcement Learning

Mohammed Amin Abdullah, Hang Ren|arXiv (Cornell University)|Jul 30, 2019
Reinforcement Learning in Robotics46 references37 citations
TL;DR

WR2L formulates robust reinforcement learning as a min–max game with an epsilon-Wasserstein constraint around a reference dynamics, and provides a scalable zero-order solver for high-dimensional, continuous tasks.

ABSTRACT

Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $ ext{W} ext{R}^{2} ext{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.

Motivation & Objective

  • Motivate robustness in RL to improve generalisation when transition dynamics vary.
  • Introduce WR2L as a generic min–max framework with Wasserstein constraints.
  • Enable robustness in continuous state-action spaces without hand-crafted disturbance models.
  • Provide a scalable solver that alternates between updating dynamics and policy.

Proposed method

  • Define a robust RL objective as max_theta min_phi E_tau~p_theta^phi[R_total(tau)].
  • Constrain allowed transition perturbations to an epsilon-Wasserstein ball around a reference dynamics P0.
  • Parametrize policy pi_theta and perturbation dynamics phi; solve via alternating optimization.
  • Use average (rather than pointwise) Wasserstein constraints to make the constraint tractable.
  • Develop a second-order Taylor-based Hessian approximation to efficiently update phi within the constraint.
  • Present a zero-order (gradient-free) method for updating dynamics when gradients are unavailable.

Experimental results

Research questions

  • RQ1How can we formulate robust RL to handle model perturbations in continuous state-action spaces?
  • RQ2Can Wasserstein distance provide a principled, geometry-aware robustness constraint for RL transitions?
  • RQ3Is it possible to efficiently solve the resulting min–max problem without explicit dynamics models?
  • RQ4Does the proposed WR2L framework improve robustness and performance on high-dimensional control tasks?

Key findings

  • WR2L achieves significant robust performance improvements on high-dimensional MuJoCo environments compared to standard and some robust baselines.
  • The algorithm accommodates both discrete and continuous state-action spaces within a unified Wasserstein-based framework.
  • A novel zero-order optimization method enables scalable updates to transition dynamics without requiring gradient information.
  • The Hessian-based constraint approximation allows tractable optimization under an epsilon-Wasserstein ball around reference dynamics.
  • The approach does not require learning a full dynamics model, leveraging a differentiable simulator or solver with parameterisable dynamics.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.