[Paper Review] Autoencoding Variational Inference For Topic Models
This paper introduces AVITM, an effective autoencoding variational Bayes method for latent Dirichlet allocation (LDA) that uses a neural inference network to approximate posterior, addressing Dirichlet priors and component collapsing; it also presents ProdLDA, a product-of-experts-based topic model with improved topic coherence.
Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven diffi- cult to apply to topic models in practice. We present what is to our knowledge the first effective AEVB based inference method for latent Dirichlet allocation (LDA), which we call Autoencoded Variational Inference For Topic Model (AVITM). This model tackles the problems caused for AEVB by the Dirichlet prior and by component collapsing. We find that AVITM matches traditional methods in accuracy with much better inference time. Indeed, because of the inference network, we find that it is unnecessary to pay the computational cost of running variational optimization on test data. Because AVITM is black box, it is readily applied to new topic models. As a dramatic illustration of this, we present a new topic model called ProdLDA, that replaces the mixture model in LDA with a product of experts. By changing only one line of code from LDA, we find that ProdLDA yields much more interpretable topics, even if LDA is trained via collapsed Gibbs sampling.
Motivation & Objective
- Motivate and enable black-box, fast inference for topic models without hand-deriving model-specific updates.
- Overcome challenges of Dirichlet priors and component collapsing in AEVB for LDA.
- Demonstrate that an inference network can match traditional inference quality with substantially faster test-time performance.
- Introduce ProdLDA, a product-of-experts topic model showing improved topic coherence over LDA.
Proposed method
- Develop AVITM: use an inference network to parameterize q(θ,z|γ,φ) and optimize the ELBO via reparameterization trick.
- Employ a Laplace approximation of the Dirichlet prior in the softmax basis to enable a Gaussian-like reparameterization for θ.
- Use a collapsed representation to sum out z in LDA to simplify sampling to θ only.
- Address component collapsing with high-momentum Adam optimization, batch normalization, dropout, and KL term annealing.
- Train ProdLDA by replacing the mixture word model with a product of experts, i.e., p(w_n|θ,β) ∝ ∏_k p(w_n|z_n=k,β)^{θ_k}.
- Provide train-time and test-time efficiency benefits by using a neural inference network to map documents directly to topic proportions.
Experimental results
Research questions
- RQ1Can AVIB methods be effectively applied to LDA by addressing Dirichlet priors and component collapsing?
- RQ2Does an inference network enable fast, accurate posterior inference for new documents without test-time optimization?
- RQ3Does ProdLDA yield improved topic coherence compared to standard LDA, and under what training conditions?
- RQ4How does AVITM compare with online mean-field inference and collapsed Gibbs sampling in terms of topic quality and speed?
- RQ5Is AVITM readily applicable to new topic models as a black-box inference method?
Key findings
- AVITM yields topics of equivalent quality to standard mean-field inference with substantially faster training and test-time performance.
- The inference network can estimate topic proportions for new documents without running variational optimization, with perplexity comparable to optimization-based approaches.
- ProdLDA consistently produces better topic coherence than standard LDA, including when LDA is trained with Gibbs sampling.
- AVITM enables training on large corpora (e.g., ~1 million documents) in under 80 minutes on a single GPU, with a one-line code change to switch from LDA to ProdLDA.
- Laplace-approximated Dirichlet priors and high-momentum training with batch normalization help mitigate component collapsing and improve topic sparsity and coherence.
- Higher topic sparsity in ProdLDA correlates with improved topic coherence, supporting the benefit of Dirichlet-like priors in neural topic models.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.