[Paper Review] Wasserstein Robust Reinforcement Learning
WR2L formulates robust reinforcement learning as a min–max game with an epsilon-Wasserstein constraint around a reference dynamics, and provides a scalable zero-order solver for high-dimensional, continuous tasks.
Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $ ext{W} ext{R}^{2} ext{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.
Motivation & Objective
- Motivate robustness in RL to improve generalisation when transition dynamics vary.
- Introduce WR2L as a generic min–max framework with Wasserstein constraints.
- Enable robustness in continuous state-action spaces without hand-crafted disturbance models.
- Provide a scalable solver that alternates between updating dynamics and policy.
Proposed method
- Define a robust RL objective as max_theta min_phi E_tau~p_theta^phi[R_total(tau)].
- Constrain allowed transition perturbations to an epsilon-Wasserstein ball around a reference dynamics P0.
- Parametrize policy pi_theta and perturbation dynamics phi; solve via alternating optimization.
- Use average (rather than pointwise) Wasserstein constraints to make the constraint tractable.
- Develop a second-order Taylor-based Hessian approximation to efficiently update phi within the constraint.
- Present a zero-order (gradient-free) method for updating dynamics when gradients are unavailable.
Experimental results
Research questions
- RQ1How can we formulate robust RL to handle model perturbations in continuous state-action spaces?
- RQ2Can Wasserstein distance provide a principled, geometry-aware robustness constraint for RL transitions?
- RQ3Is it possible to efficiently solve the resulting min–max problem without explicit dynamics models?
- RQ4Does the proposed WR2L framework improve robustness and performance on high-dimensional control tasks?
Key findings
- WR2L achieves significant robust performance improvements on high-dimensional MuJoCo environments compared to standard and some robust baselines.
- The algorithm accommodates both discrete and continuous state-action spaces within a unified Wasserstein-based framework.
- A novel zero-order optimization method enables scalable updates to transition dynamics without requiring gradient information.
- The Hessian-based constraint approximation allows tractable optimization under an epsilon-Wasserstein ball around reference dynamics.
- The approach does not require learning a full dynamics model, leveraging a differentiable simulator or solver with parameterisable dynamics.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.