Skip to main content
QUICK REVIEW

[Paper Review] Hamiltonian Descent Methods

Chris J. Maddison, Daniel Paulin|arXiv (Cornell University)|Sep 13, 2018
Stochastic Gradient Optimization Techniques33 references40 citations
TL;DR

The authors introduce a family of first-order optimization methods based on discretized conformal Hamiltonian dynamics that achieve linear convergence on a broad class of convex functions by using a kinetic energy linked to the convex conjugate of the objective.

ABSTRACT

We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger class includes functions whose second derivatives may be singular or unbounded at their minima. Our methods are discretizations of conformal Hamiltonian dynamics, which generalize the classical momentum method to model the motion of a particle with non-standard kinetic energy exposed to a dissipative force and the gradient field of the function of interest. They are first-order in the sense that they require only gradient computation. Yet, crucially the kinetic gradient map can be designed to incorporate information about the convex conjugate in a fashion that allows for linear convergence on convex functions that may be non-smooth or non-strongly convex. We study in detail one implicit and two explicit methods. For one explicit method, we provide conditions under which it converges to stationary points of non-convex functions. For all, we provide conditions on the convex function and kinetic energy pair that guarantee linear convergence, and show that these conditions can be satisfied by functions with power growth. In sum, these methods expand the class of convex functions on which linear convergence is possible with first-order computation.

Motivation & Objective

  • Extend the class of convex functions for which linear convergence is attainable with first-order methods.
  • Develop first-order discretizations of conformal Hamiltonian dynamics that are robust to non-smooth or non-strongly convex functions.
  • Leverage a kinetic energy designed from the convex conjugate to condition convergence.
  • Provide theoretical guarantees and conditions under which discretizations converge linearly to minimizers.

Proposed method

  • Model optimization as a conformal Hamiltonian system with state (x, p) and dynamics x' = ∇k(p), p' = -∇f(x) - γp.
  • Choose kinetic energy k so that k(p) upper bounds a centered convex conjugate f_c^*(p) to achieve linear convergence.
  • Analyze three discretizations (one implicit, two explicit) and establish conditions for linear convergence on convex f.
  • Prove convergence to stationary points for non-convex cases under specific discretization schemes.
  • Introduce a Lyapunov-like function V(x, p) = H(x, p) + β⟨x - x*, p⟩ and derive bounds that yield linear rates.
  • Present a family of kinetic energies with power-growth that match tail/body behavior of f to sustain fixed step sizes.

Experimental results

Research questions

  • RQ1Under what conditions on the pairing of f and the kinetic energy k can conformal Hamiltonian dynamics achieve linear convergence for convex functions?
  • RQ2How can discretizations of the continuous dynamics guarantee linear rates, and what are the precise assumptions on f and k to ensure this?
  • RQ3Can fixed step-size first-order methods derived from Hamiltonian descent converge when f is non-smooth or non-strongly convex?
  • RQ4What role does the convex conjugate play in shaping the kinetic map to improve conditioning of optimization?

Key findings

  • Continuous-time Hamiltonian descent achieves linear convergence when k(p) upper bounds the centered convex conjugate of f.
  • Three discretizations (one implicit, two explicit) attain linear convergence under corresponding assumptions on f and k.
  • There exist power-growth kinetic energies that enable linear rates for functions with different tail/body behaviors.
  • Fixed step sizes can be used without adaptation while maintaining linear convergence under the right k–f pairing.
  • A Lyapunov-based analysis using V(x, p) provides a tractable route to quantify and guarantee contraction rates.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.