Skip to main content
QUICK REVIEW

[Paper Review] A Stein variational Newton method

Gianluca Detommaso, Tiangang Cui|arXiv (Cornell University)|Jun 8, 2018
Markov Chains and Monte Carlo Methods28 references42 citations
TL;DR

The paper extends Stein variational gradient descent by incorporating second-order (Newton-like) information to samplers, introducing a Stein variational Newton (SVN) method with geometry-aware kernels to accelerate convergence.

ABSTRACT

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

Motivation & Objective

  • Motivate acceleration of nonparametric variational inference for challenging target distributions
  • Introduce a Newton-like iteration in function space for transport maps
  • Leverage second-order information to improve kernel choice and particle movement
  • Propose scalable approximations to compute Newton directions in RKHS
  • Demonstrate computational gains and kernel design benefits via experiments

Proposed method

  • Define a Newton-like direction in the space of transport maps to minimize a local quadratic approximation of the KL objective
  • Derive a Galerkin (kernel) representation to compute the Newton direction via a finite-dimensional linear system
  • Introduce inexact Newton–CG and block-diagonal Hessian approximations for scalability
  • Develop geometry-aware anisotropic kernels using an average Hessian M_p to adapt distances in the RKHS
  • Provide Algorithms 1 and 2 detailing SVGD and SVN iterations respectively
  • Discuss scaling and kernel choices with a Hessian-based kernel normalization (g(d) factor) for high-dimensional problems

Experimental results

Research questions

  • RQ1Can second-order information accelerate convergence of Stein variational methods for sampling?
  • RQ2How can one design kernels that incorporate curvature information to improve transport in high probability regions?
  • RQ3What scalable approximations (block-diagonal, inexact Newton) preserve descent while reducing computation?
  • RQ4Does a geometry-aware kernel outperform isotropic kernels in challenging Bayesian inference tasks?

Key findings

  • SVN with second-order information converges faster than standard SVGD in test cases
  • A geometry-aware Hessian kernel significantly improves convergence speed and particle distribution
  • Block-diagonal and inexact Newton–CG approximations provide scalable alternatives with similar progress per iteration
  • Scaled Hessian kernels help maintain robust performance in high-dimensional settings
  • SVN-H (Newton with Hessian kernel) achieves accurate posterior means and credible intervals in high-dimensional diffusion tests
  • The method shows good agreement with reference MCMC in a Langevin SDE example

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.