QUICK REVIEW

[Paper Review] Asymptotically Exact, Embarrassingly Parallel MCMC

Willie Neiswanger, Chong Wang|arXiv (Cornell University)|Nov 19, 2013

Markov Chains and Monte Carlo Methods23 references194 citations

TL;DR

This paper proposes an embarrassingly parallel Markov Chain Monte Carlo (MCMC) method that partitions data across multiple machines, allowing independent MCMC sampling on each subset with no inter-machine communication during sampling. The method combines subposterior samples using parametric, nonparametric, or semiparametric density product estimation to produce asymptotically exact samples from the full-data posterior, significantly accelerating burn-in and sampling in big data settings.

ABSTRACT

Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.

Motivation & Objective

To address the high communication and computational cost of traditional parallel MCMC in distributed data settings.
To enable parallelization of both burn-in and sampling phases in MCMC without sacrificing asymptotic exactness.
To develop a post-processing combination procedure that transforms subposterior samples into full-data posterior samples.
To ensure the method is compatible with existing MCMC software and frameworks like MapReduce.
To prove theoretical guarantees on asymptotic exactness under various combination strategies.

Proposed method

Partition the full dataset into M disjoint subsets and perform independent MCMC sampling on each subset to generate subposterior samples.
Define subposterior densities as p_m(θ) ∝ p(θ) * p(x_{nm}|θ)^(1/M), where x_{nm} is the m-th data subset.
Use parametric, nonparametric, or semiparametric estimation to combine subposterior samples into an estimate of the full posterior density product.
For parametric combination, fit a multivariate normal distribution to subposterior samples and compute the product via precision-weighted mean and covariance.
For nonparametric combination, use kernel density estimation to approximate the product of subposterior densities.
For semiparametric combination, combine parametric and nonparametric components to balance accuracy and scalability.

Experimental results

Research questions

RQ1Can MCMC sampling be effectively parallelized across data partitions with minimal communication while maintaining asymptotic exactness?
RQ2How do different density product estimation strategies (parametric, nonparametric, semiparametric) affect the accuracy and convergence of the combined posterior samples?
RQ3Does the proposed method reduce burn-in time and accelerate sampling compared to single-chain MCMC in large-scale settings?
RQ4How does the method scale with increasing dimensionality and multimodality in the posterior distribution?
RQ5Can the method be efficiently implemented in a MapReduce-style distributed computing framework?

Key findings

The parametric combination method achieved the fastest convergence and best scalability with dimensionality, outperforming nonparametric and semiparametric methods in high-dimensional synthetic data.
In Bayesian logistic regression, the parallel method achieved higher classification accuracy up to 10x faster than single-chain MCMC with M=50 splits.
For multimodal posteriors (e.g., Gaussian mixture models), the parametric and subpostAvg methods produced biased samples that failed to capture multimodality, while nonparametric and semiparametric methods correctly recovered the true posterior.
In hierarchical Poisson-gamma models, the proposed combination methods completed burn-in and converged to low posterior error significantly faster than subpostAvg, subpostPool, and full-chain methods.
The nonparametric and semiparametric combination procedures produced asymptotically exact samples, with error converging to zero as the number of subposterior samples increased.
The method demonstrated practical utility in real-world applications, including forest cover type prediction, with measurable speedups and maintained accuracy.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.