QUICK REVIEW

[Paper Review] On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning

Bolin Gao, Lacra Pavel|arXiv (Cornell University)|Apr 3, 2017

Mathematical and Theoretical Epidemiology and Ecology Models38 references212 citations

TL;DR

The paper shows that softmax is the gradient of the log-sum-exp function, derives Lipschitz and co-coercivity properties controlled by the inverse temperature, and demonstrates application to a stateless, game-theoretic reinforcement learning scheme.

ABSTRACT

In this paper, we utilize results from convex analysis and monotone operator theory to derive additional properties of the softmax function that have not yet been covered in the existing literature. In particular, we show that the softmax function is the monotone gradient map of the log-sum-exp function. By exploiting this connection, we show that the inverse temperature parameter determines the Lipschitz and co-coercivity properties of the softmax function. We then demonstrate the usefulness of these properties through an application in game-theoretic reinforcement learning.

Motivation & Objective

Expand the mathematical understanding of the softmax function using convex analysis and monotone operator theory.
Establish that softmax is the gradient of the log-sum-exp potential and study how the inverse temperature lambda affects its properties.
Demonstrate how these properties can guarantee convergence aspects in a simple game-theoretic reinforcement learning setup.

Proposed method

Show that the softmax is the gradient of the log-sum-exp function (Proposition 1).
Compute the Hessian/Jacobian of the log-sum-exp function to obtain the softmax Jacobian (Proposition 2).
Establish Lipschitz continuity of softmax with constant L = lambda (Proposition 4).
Derive 1/L-co-coercivity of softmax via Baillon–Haddad theorem (Corollary 2).
Discuss monotonicity properties and maximal monotonicity of softmax (Proposition 3 and Corollary 1).
Apply these properties to a stateless continuous-time reinforcement learning scheme (EXP-D-RL) in a single-player game to illustrate convergence insights (Section VI).

Experimental results

Research questions

RQ1What additional properties of the softmax function can be derived from convex analysis and monotone operator theory?
RQ2How does the inverse temperature lambda influence Lipschitz and co-coercivity properties of softmax?
RQ3Can the derived properties ensure convergence of learning dynamics in game-theoretic reinforcement learning?
RQ4How is softmax related to the log-sum-exp potential and its duality with negative entropy in this context?
RQ5What is the role of softmax in replicator-type dynamics and evolutionary game theory connections?

Key findings

Softmax is the gradient of the log-sum-exp function (softmax = gradient of log-sum-exp).
The Jacobian of softmax is lambda times (diag(sigma(z)) − sigma(z)sigma(z)^T).
Softmax is lambda-Lipschitz and 1/lambda-co-coercive with respect to the Euclidean norm.
Baillon–Haddad theorem implies 1/lambda-co-coercivity of softmax via Lipschitz gradient of the log-sum-exp.
Softmax is monotone and maximal monotone (not strictly monotone) on R^n.
These properties can be used to analyze convergence in a stateless, continuous-time reinforcement learning scheme (EXP-D-RL).

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.