QUICK REVIEW

[Paper Review] Bayesian Agglomerative Clustering with Coalescents

Yee Whye Teh, Hal Daumé|ArXiv.org|Jul 4, 2009

Bayesian Methods and Mixture Models11 references73 citations

TL;DR

This paper proposes a novel Bayesian agglomerative clustering model using Kingman’s coalescent as a prior over hierarchical trees, enabling efficient greedy and sequential Monte Carlo inference. The method achieves superior clustering performance on document and phylolinguistic data by combining the predictive coherence of exchangeable priors with the computational efficiency of agglomerative algorithms.

ABSTRACT

We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.

Motivation & Objective

To develop a Bayesian hierarchical clustering model that combines the predictive coherence of exchangeable priors with the efficiency of agglomerative inference.
To address limitations in existing probabilistic clustering models, such as lack of predictive semantics and poor handling of missing data.
To enable efficient inference via greedy and sequential Monte Carlo algorithms that build trees bottom-up in an agglomerative fashion.
To ensure the induced distribution over data points is exchangeable, supporting coherent extension to new data.
To demonstrate strong empirical performance on real-world datasets, including NIPS abstracts and phylolinguistic data.

Proposed method

Uses Kingman’s coalescent as a nonparametric prior over tree structures, modeling the genealogical merging of data points backward in time.
Employs a continuous-time, partition-valued Markov process where each pair of lineages coalesces at rate $\binom{m}{2}$ when $m$ lineages remain.
Develops a greedy inference algorithm (Greedy-Rate1) that selects the next merge based on the rate of coalescence, achieving $O(n^2)$ runtime.
Applies sequential Monte Carlo (SMC) inference to sample from the posterior over trees, maintaining a set of weighted particle trees.
Uses log-likelihood ratios at each branch to determine optimal flat cluster cuts from the coalescent tree.
Preprocesses data by retaining only words appearing in at least 100 NIPS abstracts and converting counts to binary for clustering.

Experimental results

Research questions

RQ1Can Kingman’s coalescent serve as an effective, exchangeable prior over clustering trees in a Bayesian hierarchical clustering framework?
RQ2Can efficient greedy and SMC inference algorithms be designed for this model that operate in an agglomerative, bottom-up manner?
RQ3Does the proposed model outperform existing agglomerative clustering methods in terms of predictive performance and clustering quality?
RQ4How well does the model generalize to real-world data, such as document collections and linguistic phylogenies?
RQ5What is the underlying random distribution induced by the model, and does the posterior converge to the true distribution as data increases?

Key findings

The Greedy-Rate1 algorithm achieves $O(n^2)$ runtime and delivers comparable clustering quality to other greedy methods, making it the recommended choice.
The model discovers nine meaningful clusters in NIPS abstracts, successfully separating Bayesian learning (cluster 5) from non-bayesian learning (cluster 7), despite shared authors like Mike Jordan.
The log-likelihood ratio at the split between clusters 2 and 3 was only 0.105, indicating they are highly similar and would merge under a slightly higher threshold.
Empirical results show the model outperforms other agglomerative clustering algorithms in both document clustering and phylolinguistic applications.
The model’s exchangeable prior enables coherent prediction on new data and integrates naturally within larger probabilistic models.
Theoretical analysis confirms the model’s consistency and connection to known processes: when mutations follow a rate $\alpha/2$ and new states are i.i.d. from $H$, the induced distribution is a Dirichlet process $DP(\alpha, H)$.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.