Skip to main content
QUICK REVIEW

[Paper Review] High-Dimensional Probability Estimation with Deep Density Models

Oren Rippel, Ryan P. Adams|arXiv (Cornell University)|Feb 20, 2013
Generative Adversarial Networks and Image Synthesis59 citations
TL;DR

This paper introduces the Deep Density Model (DDM), a normalizing flow-based approach that uses deep neural networks to learn a bijective, invertible transformation from high-dimensional data to a latent space with approximately factorized, known marginal distributions. By ensuring the Jacobian determinant is tractable, DDM enables exact density estimation without partition functions, allowing efficient likelihood computation, direct sampling, and application to semi-supervised learning and calibrated Bayesian classification.

ABSTRACT

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations lie on a lower-dimensional manifold of high probability. It has been more difficult, however, to exploit this insight to build explicit, tractable density models for high-dimensional data. In this paper, we introduce the deep density model (DDM), a new approach to density estimation. We exploit insights from deep learning to construct a bijective map to a representation space, under which the transformation of the distribution of the data is approximately factorized and has identical and known marginal densities. The simplicity of the latent distribution under the model allows us to feasibly explore it, and the invertibility of the map to characterize contraction of measure across it. This enables us to compute normalized densities for out-of-sample data. This combination of tractability and flexibility allows us to tackle a variety of probabilistic tasks on high-dimensional datasets, including: rapid computation of normalized densities at test-time without evaluating a partition function; generation of samples without MCMC; and characterization of the joint entropy of the data.

Motivation & Objective

  • Address the challenge of tractable, normalized density estimation for high-dimensional data, where traditional methods like MCMC or partition function computation are infeasible.
  • Overcome limitations of existing models—such as undirected models (lacking normalization) and directed models (requiring costly inference)—by enabling fully normalized, tractable likelihoods.
  • Leverage insights from deep learning and differential geometry to construct flexible, invertible transformations that map complex data distributions to simple, factorized latent distributions.
  • Enable new applications in generative modeling, semi-supervised learning, and Bayesian classification by providing well-calibrated, normalized probability estimates.
  • Characterize the entropy and information-theoretic structure of high-dimensional data distributions through the learned latent representation and its transformation properties.

Proposed method

  • Define a deep neural network-based bijective transformation (invertible mapping) from the observed data space to a lower-dimensional latent space.
  • Optimize the transformation such that the induced distribution in the latent space becomes approximately factorized with known, tractable marginal densities (e.g., Beta or Bernoulli distributions).
  • Use the change-of-variables formula to compute normalized densities: $ p_{\mathbf{Y}}(\mathbf{y}) = p_{\mathbf{Z}}(\mathbf{z}) \cdot \left| \det \mathbf{J}_{\mathbf{y} \to \mathbf{z}} \right| $, where $ \mathbf{z} = f(\mathbf{y}) $ and $ \mathbf{J} $ is the Jacobian of the transformation.
  • Enforce approximate independence in the latent space through a diversification process that encourages sparse, uncorrelated representations.
  • Utilize the invertibility of the transformation to generate samples by sampling from the simple latent distribution and passing through the inverse network.
  • Apply the model to supervised and semi-supervised learning by training class-conditional DDMs and using expectation-maximization with weighted data to improve generalization.

Experimental results

Research questions

  • RQ1Can we construct a flexible, invertible deep network that transforms high-dimensional data into a latent space with tractable, factorized marginal densities?
  • RQ2How can we ensure that the resulting density estimate is fully normalized without requiring partition function computation?
  • RQ3To what extent can the learned latent representation capture the intrinsic structure of high-dimensional data, such as manifolds or low-dimensional subspaces?
  • RQ4Can the DDM support efficient, exact likelihood inference and direct sampling without MCMC, enabling practical use in probabilistic modeling?
  • RQ5How can normalized densities from DDMs be leveraged to build well-calibrated Bayesian classifiers and improve semi-supervised learning through density-based regularization?

Key findings

  • The DDM achieves exact, normalized density estimation for high-dimensional data by leveraging invertible deep networks and tractable Jacobian determinants, eliminating the need for partition function computation.
  • On MNIST, the model's marginal entropy was estimated at 20.72, close to the expected value of 21.02 under a Bernoulli model with parameter $ p \approx 0.0465 $, validating the accuracy of the latent distribution approximation.
  • The model enables direct, MCMC-free sampling from the data distribution by sampling in the latent space and applying the inverse transformation, as demonstrated in visualizations of generated samples.
  • A Bayesian classifier built using class-conditional DDMs achieved a test error rate of 1.614% when penalizing low density on foreign-class examples, significantly outperforming a raw mixture model (9.5% error).
  • Among confident predictions (approximately 95% of test data), the DDM-based classifier achieved a low error rate of 0.45%, demonstrating well-calibrated uncertainty estimates.
  • The approach supports semi-supervised learning by enabling training of mixture models with weighted data using expectation-maximization, leveraging unlabeled data through density estimation in the latent space.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.