QUICK REVIEW

[Paper Review] Random Feature Expansions for Deep Gaussian Processes

Kurt Cutajar, Edwin V. Bonilla|Graduate School and Research Center in Digital Science (EURECOM)|Oct 14, 2016

Gaussian Processes and Bayesian Inference31 references83 citations

TL;DR

This paper proposes a scalable deep Gaussian process (DGP) framework using random feature expansions to approximate covariance functions, enabling stochastic variational inference for efficient, probabilistic learning. The method achieves state-of-the-art performance on large-scale datasets—such as MNIST8M (8M samples) and AIRLINE (5M flights)—with up to 30 layers, outperforming existing DGP and DNN baselines in accuracy and uncertainty quantification while running efficiently on a single machine without GPUs.

ABSTRACT

The composition of multiple Gaussian Processes as a Deep Gaussian Process (DGP) enables a deep probabilistic nonparametric approach to flexibly tackle complex machine learning problems with sound quantification of uncertainty. Existing inference approaches for DGP models have limited scalability and are notoriously cumbersome to construct. In this work, we introduce a novel formulation of DGPs based on random feature expansions that we train using stochastic variational inference. This yields a practical learning framework which significantly advances the state-of-the-art in inference for DGPs, and enables accurate quantification of uncertainty. We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers.

Motivation & Objective

To address the scalability and computational intractability of deep Gaussian processes (DGPs) in large-scale and deep architectures.
To develop a practical, probabilistic inference framework for DGPs that enables uncertainty quantification and efficient training.
To overcome limitations of existing DGP inference methods, which are often restricted to shallow networks and lack mini-batch scalability.
To demonstrate that random feature expansions can yield Bayesian deep neural networks with interpretable priors and low-rank weight matrices.
To enable training of deep probabilistic models on datasets with millions of observations, previously considered infeasible for DGPs.

Proposed method

Approximates all GP layers in the DGP using random feature expansions (Rahimi & Recht, 2008), transforming covariance functions into explicit feature maps.
Employs stochastic variational inference (SVI) with mini-batch gradient optimization to scale training to large datasets.
Uses a probabilistic formulation where the random features are treated as latent variables with structured priors, enabling Bayesian learning.
Leverages automatic differentiation in TensorFlow to compute gradients for SVI, avoiding manual derivation.
Applies low-rank weight matrices via random features, resulting in DNN-like architectures with interpretable priors.
Supports both RBF (trigonometric activation) and ARC-COSINE (ReLU-like) kernels through different feature expansions.

Experimental results

Research questions

RQ1Can random feature expansions enable scalable and tractable inference in deep Gaussian processes for large-scale datasets?
RQ2How does the proposed DGP with random features compare to standard DNNs and other DGP inference methods in terms of accuracy and uncertainty quantification?
RQ3Can the framework scale to deep architectures (e.g., 30 layers) on datasets with millions of observations?
RQ4Does the use of stochastic variational inference with random features preserve the probabilistic nature of DGPs while enabling efficient training?
RQ5How does the model perform on real-world large-scale regression and classification tasks compared to state-of-the-art GP and DNN baselines?

Key findings

The proposed DGP with random features achieved 99.14% test accuracy on MNIST8M (8M samples), comparable to AutoGP (99.11%) and significantly outperforming standard DNNs in uncertainty quantification.
On the AIRLINE dataset (5M flights), the model achieved 78.1% accuracy and 0.457 MNLL, matching the performance of Wilson et al. (2016) with state-of-the-art GP methods.
Training converged in under two hours for models with up to 30 layers on the AIRLINE dataset, demonstrating scalability and efficiency.
The negative lower bound was shown to be a reliable objective for model selection, as confirmed by box plots over 100 mini-batches.
The framework outperformed DNNs trained with dropout in uncertainty metrics, indicating superior uncertainty quantification.
The method achieved competitive results without GPUs, and is designed to scale further using GPU and distributed computing.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.