QUICK REVIEW

[Paper Review] A Stein variational Newton method

Gianluca Detommaso, Tiangang Cui|arXiv (Cornell University)|Jun 8, 2018

Markov Chains and Monte Carlo Methods28 references42 citations

TL;DR

The paper extends Stein variational gradient descent by incorporating second-order (Newton-like) information to samplers, introducing a Stein variational Newton (SVN) method with geometry-aware kernels to accelerate convergence.

ABSTRACT

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

Motivation & Objective

Motivate acceleration of nonparametric variational inference for challenging target distributions
Introduce a Newton-like iteration in function space for transport maps
Leverage second-order information to improve kernel choice and particle movement
Propose scalable approximations to compute Newton directions in RKHS
Demonstrate computational gains and kernel design benefits via experiments

Proposed method

Define a Newton-like direction in the space of transport maps to minimize a local quadratic approximation of the KL objective
Derive a Galerkin (kernel) representation to compute the Newton direction via a finite-dimensional linear system
Introduce inexact Newton–CG and block-diagonal Hessian approximations for scalability
Develop geometry-aware anisotropic kernels using an average Hessian M_p to adapt distances in the RKHS
Provide Algorithms 1 and 2 detailing SVGD and SVN iterations respectively
Discuss scaling and kernel choices with a Hessian-based kernel normalization (g(d) factor) for high-dimensional problems

Experimental results

Research questions

RQ1Can second-order information accelerate convergence of Stein variational methods for sampling?
RQ2How can one design kernels that incorporate curvature information to improve transport in high probability regions?
RQ3What scalable approximations (block-diagonal, inexact Newton) preserve descent while reducing computation?
RQ4Does a geometry-aware kernel outperform isotropic kernels in challenging Bayesian inference tasks?

Key findings

SVN with second-order information converges faster than standard SVGD in test cases
A geometry-aware Hessian kernel significantly improves convergence speed and particle distribution
Block-diagonal and inexact Newton–CG approximations provide scalable alternatives with similar progress per iteration
Scaled Hessian kernels help maintain robust performance in high-dimensional settings
SVN-H (Newton with Hessian kernel) achieves accurate posterior means and credible intervals in high-dimensional diffusion tests
The method shows good agreement with reference MCMC in a Langevin SDE example

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.