QUICK REVIEW

[Paper Review] Differentiable MPC for End-to-end Planning and Control

Brandon Amos, Ivan Dario Jimenez Rodriguez|arXiv (Cornell University)|Oct 31, 2018

Reinforcement Learning in Robotics54 references145 citations

TL;DR

The paper introduces differentiable MPC by differentiating through the MPC fixed point with box-constrained iLQR, enabling end-to-end learning of cost and dynamics for imitation in continuous control domains. It demonstrates data-efficient imitation and advantages over standard system identification.

ABSTRACT

We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning in continuous state and action spaces. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.

Motivation & Objective

Motivate combining model-based MPC with end-to-end learning for continuous control.
Propose an analytical method to differentiate through a box-constrained MPC solved via an iLQR-like procedure.
Show that learning the MPC cost and dynamics from expert demonstrations can be more data-efficient than neural nets.
Demonstrate imitation learning results in pendulum and cart-pole domains and compare with system identification.

Proposed method

Model MPC as a differentiable module parameterized by cost C and dynamics f with box constraints.
Differentiate through the fixed point of the non-convex MPC solver by solving a linearized KKT system using an additional backward pass.
Extend differentiable differentiation from LQR to box-constrained QPs via a derivative of the KKT conditions.
Use a fixed-point differentiation approach that reuses forward pass factorizations to achieve constant-time backward passes.
Provide an implementation and experiments showing end-to-end learning via gradient-based optimization (imitation losses).
Release open-source solver and experiments (mpc.pytorch).

Experimental results

Research questions

RQ1Can MPC be used as a differentiable policy class for end-to-end learning in continuous control?
RQ2Is it possible to differentiate through a box-constrained MPC efficiently by fixed-point methods rather than unrolling?
RQ3Does end-to-end imitation using differentiable MPC recover cost and dynamics from an expert better than system identification?
RQ4How does differentiable MPC compare to neural networks in data efficiency for continuous control imitation?
RQ5Can the framework handle non-realizable experts and still provide useful gradients for learning?

Key findings

Differentiable MPC yields more data-efficient imitation than a generic neural network policy.
The method can recover the cost and dynamics of an MPC expert from actions alone, sometimes matching or surpassing system identification in non-realizable settings.
Fixed-point differentiation of the MPC solver is more memory- and compute-efficient than unrolled differentiation, with a backward pass that is effectively free given forward pass factors.
The approach supports learning both cost and dynamics end-to-end, enabling task-loss-driven optimization beyond simple state prediction.
The authors provide open-source implementations, demonstrating practical applicability and reproducibility.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.