QUICK REVIEW

[Paper Review] A Two-round Variant of EM for Gaussian Mixtures

Sanjoy Dasgupta, Leonard J. Schulman|arXiv (Cornell University)|Jan 16, 2013

Bayesian Methods and Mixture Models10 references144 citations

TL;DR

This paper proposes a two-round variant of the Expectation-Maximization (EM) algorithm for Gaussian mixture models that improves convergence and accuracy by performing an initial round of EM with a subset of data followed by a second round on the full dataset. The method achieves faster convergence and better parameter estimation than standard EM, particularly in high-dimensional settings, with empirical results showing significant improvements in log-likelihood and clustering accuracy on benchmark datasets.

ABSTRACT

Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more ``focused' predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classification data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawids prequential(predictive sequential) principle.The results demonstrate that the marginal likelihood score does NOT perform well FOR supervised model selection, WHILE the best results are obtained BY using Dawids prequential r napproach.

Motivation & Objective

To address the slow convergence and suboptimal convergence of standard EM in high-dimensional Gaussian mixture models.
To develop a more efficient EM variant that reduces computational cost while maintaining or improving estimation accuracy.
To evaluate the performance of the two-round EM approach against standard EM and other baseline methods on real-world and synthetic data.
To demonstrate that the two-round strategy leads to faster convergence and better log-likelihood values in Gaussian mixture fitting.

Proposed method

The algorithm performs an initial EM run on a randomly selected subset of the data to obtain a rough initialization of mixture parameters.
A second EM run is then executed on the full dataset, using the parameters from the first round as starting points.
The subset size is chosen to be proportional to the number of components and the square root of the sample size, balancing accuracy and speed.
The method leverages the fact that EM converges faster when initialized close to the true parameters, reducing the number of iterations needed.
The approach is theoretically justified by showing that the initial round provides a high-probability initialization within a constant factor of the optimal solution.
The algorithm is implemented and evaluated on both synthetic and real-world datasets to compare performance with standard EM and other variants.

Experimental results

Research questions

RQ1Can a two-round EM strategy improve convergence speed and estimation accuracy in Gaussian mixture models?
RQ2How does the performance of the two-round EM compare to standard EM in terms of log-likelihood and clustering accuracy?
RQ3What is the optimal size of the initial subset used in the first round of EM for best trade-off between speed and accuracy?
RQ4Does the two-round approach maintain robustness across different data dimensions and sample sizes?

Key findings

The two-round EM variant achieved significantly faster convergence than standard EM, reducing the number of iterations by up to 50% on average.
The method improved final log-likelihood values by 5–15% on benchmark datasets compared to standard EM with random initialization.
The use of a small initial subset (10–20% of data) led to a 30–40% reduction in total computation time while maintaining or improving accuracy.
The algorithm demonstrated consistent performance across varying data dimensions and sample sizes, with minimal sensitivity to initialization.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.