QUICK REVIEW

[Paper Review] Distributionally Robust Stochastic Optimization with Wasserstein Distance

Rui Gao, Anton J. Kleywegt|arXiv (Cornell University)|Apr 8, 2016

Risk and Portfolio Optimization97 citations

TL;DR

This paper proposes a distributionally robust stochastic optimization framework using Wasserstein distance to define ambiguity sets, enabling tractable and interpretable worst-case distribution estimation. It establishes strong duality for the resulting minmax problem and shows that data-driven DRSO can be approximated by robust optimization, with explicit worst-case distributions derived via first-order optimality conditions under general conditions on the objective function's growth rate.

ABSTRACT

Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is a known true underlying probability distribution, one hedges against a chosen set of distributions. In this paper we first point out that the set of distributions should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices. We next consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. Such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets. (2) The problem of determining the worst-case expectation over the resulting set of distributions has desirable tractability properties. We derive a strong duality reformulation of the corresponding DRSO problem and construct approximate worst-case distributions explicitly via the first-order optimality conditions of the dual problem. Our contributions are four-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which are naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) Our strong duality result holds in a very general setting. As examples, we show that it can be applied to infinite-dimensional process control and intensity estimation for point processes.

Motivation & Objective

To address the limitations of traditional ambiguity sets in distributionally robust stochastic optimization, particularly those based on moment constraints that can yield overly conservative or unrealistic worst-case distributions.
To propose a Wasserstein distance-based ambiguity set that better reflects data-driven uncertainty and yields more reasonable worst-case distributions.
To establish strong duality for the resulting DRSO problem in a general setting, enabling tractable reformulation.
To derive explicit constructions of worst-case distributions using first-order optimality conditions, ensuring interpretability and computational feasibility.
To demonstrate the applicability of the framework to infinite-dimensional problems such as process control and point process intensity estimation.

Proposed method

Uses a Wasserstein ball of radius θ centered at a nominal distribution ν to define the ambiguity set, ensuring distributions are close in metric space.
Derives a strong duality reformulation of the DRSO problem, transforming the minmax problem into a dual form amenable to optimization.
Constructs worst-case distributions explicitly by solving the dual problem’s first-order optimality conditions, yielding a clear structural form.
Applies the framework to infinite-dimensional settings by leveraging the dual formulation and properties of the Wasserstein metric.
Uses concentration inequalities from Bolley et al. [13] to select the Wasserstein radius θ based on empirical data, ensuring high-probability coverage of the true distribution.
Demonstrates that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, enabling use of existing robust optimization tools.

Experimental results

Research questions

RQ1What are the necessary and sufficient conditions for the existence of a worst-case distribution in a Wasserstein-based DRSO framework?
RQ2How can worst-case distributions be explicitly constructed when the ambiguity set is defined via a Wasserstein ball?
RQ3What is the structural form and interpretability of the worst-case distributions obtained under the Wasserstein ambiguity set?
RQ4Can data-driven DRSO problems be approximated by robust optimization problems, and if so, to what accuracy?
RQ5How can the radius θ of the Wasserstein ball be selected in a statistically principled way using empirical data?

Key findings

The existence of a worst-case distribution is guaranteed if and only if the objective function grows sufficiently fast at infinity, which is a necessary and sufficient condition tied to the function's growth rate.
The worst-case distributions resulting from a Wasserstein ambiguity set have a concise, interpretable structure: they shift mass from the nominal distribution to the worst-case tail, with explicit formulas derived via optimality conditions.
For the case of a linear objective function, the worst-case distribution is explicitly constructed as μ_t^q with t = VaR_α^ν[-w^Tξ], and the worst-case value-at-risk is the unique solution to an integral equation involving the Wasserstein distance.
The DRSO problem with a Wasserstein ambiguity set can be approximated to arbitrary accuracy by robust optimization problems, making many DRSO problems tractable using existing robust optimization tools.
A concentration inequality is derived for the empirical Wasserstein distance, enabling data-driven selection of the radius θ such that the true distribution lies within the ambiguity set with high probability, e.g., 95% confidence.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.