[Paper Review] Automatic Differentiation Variational Inference
advI automatically derives scalable variational inference algorithms for differentiable probabilistic models, enabling fast posterior approximation without model-specific derivations, and is integrated into Stan.
Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use.
Motivation & Objective
- Motivate reducing the inference bottleneck in probabilistic modeling and model refinement cycles.
- Develop an automated method to derive variational inference algorithms for a broad class of differentiable models without needing conjugacy.
- Integrate automatic differentiation and transformations to enable scalable VI on large datasets.
- Demonstrate applicability across multiple models and compare performance with MCMC.
Proposed method
- Transform latent variables to an unconstrained real coordinate space to enable a universal variational family.
- Use a Gaussian variational family in the transformed space (mean-field or full-rank) and implicit non-Gaussianity in the original space through a change of variables.
- Reparameterize gradients using the stochastic gradient (reparameterization) trick to express gradients as expectations over a standard Gaussian.
- Compute the ELBO and its gradients via Monte Carlo integration and automatic differentiation, enabling automatic optimization.
- Employ an adaptive stochastic gradient ascent with a novel step-size schedule to ensure convergence and efficiency.
- Implement the approach inside Stan, leveraging its library of variable transformations and automatic differentiation.
Experimental results
Research questions
- RQ1Can automatic differentiation variational inference (ADVI) produce accurate posterior approximations for a wide class of differentiable models without conjugacy assumptions?
- RQ2How does ADVI perform in terms of speed and scalability compared to traditional MCMC on large datasets?
- RQ3What is the impact of latent variable transformations and variational family choices on the quality of the posterior approximation?
- RQ4Can ADVI handle nonconjugate, complex models (e.g., mixtures, non-linear models) effectively in a probabilistic programming framework?
Key findings
- ADVI automates the process of deriving variational inference algorithms for a large class of differentiable models.
- The method supports non-conjugate models and is integrated into Stan for immediate use.
- ADVI scales to large datasets and is demonstrated on ten probabilistic models, including a dataset with millions of observations.
- Transforming constrained latent variables to real space enables a universal variational approximation strategy.
- Gradient estimates are obtained via Monte Carlo with automatic differentiation, enabling stochastic optimization.
- An adaptive step-size sequence improves convergence and practical performance.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.