[Paper Review] A Stein variational Newton method
The paper extends Stein variational gradient descent by incorporating second-order (Newton-like) information to samplers, introducing a Stein variational Newton (SVN) method with geometry-aware kernels to accelerate convergence.
Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.
Motivation & Objective
- Motivate acceleration of nonparametric variational inference for challenging target distributions
- Introduce a Newton-like iteration in function space for transport maps
- Leverage second-order information to improve kernel choice and particle movement
- Propose scalable approximations to compute Newton directions in RKHS
- Demonstrate computational gains and kernel design benefits via experiments
Proposed method
- Define a Newton-like direction in the space of transport maps to minimize a local quadratic approximation of the KL objective
- Derive a Galerkin (kernel) representation to compute the Newton direction via a finite-dimensional linear system
- Introduce inexact Newton–CG and block-diagonal Hessian approximations for scalability
- Develop geometry-aware anisotropic kernels using an average Hessian M_p to adapt distances in the RKHS
- Provide Algorithms 1 and 2 detailing SVGD and SVN iterations respectively
- Discuss scaling and kernel choices with a Hessian-based kernel normalization (g(d) factor) for high-dimensional problems
Experimental results
Research questions
- RQ1Can second-order information accelerate convergence of Stein variational methods for sampling?
- RQ2How can one design kernels that incorporate curvature information to improve transport in high probability regions?
- RQ3What scalable approximations (block-diagonal, inexact Newton) preserve descent while reducing computation?
- RQ4Does a geometry-aware kernel outperform isotropic kernels in challenging Bayesian inference tasks?
Key findings
- SVN with second-order information converges faster than standard SVGD in test cases
- A geometry-aware Hessian kernel significantly improves convergence speed and particle distribution
- Block-diagonal and inexact Newton–CG approximations provide scalable alternatives with similar progress per iteration
- Scaled Hessian kernels help maintain robust performance in high-dimensional settings
- SVN-H (Newton with Hessian kernel) achieves accurate posterior means and credible intervals in high-dimensional diffusion tests
- The method shows good agreement with reference MCMC in a Langevin SDE example
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.