[Paper Review] Deep learning with differential Gaussian process flows
This paper introduces Deep Learning with Differential Gaussian Process Flows, a continuous-time deep learning framework that models data transformations via stochastic differential equations (SDEs) in the input space. By warping inputs through infinitely deep, infinitesimal flows, the method achieves state-of-the-art performance in regression and classification, outperforming deep Gaussian processes and neural networks while using fewer inducing parameters.
We propose a novel deep learning paradigm of differential flows that learn a stochastic differential equation transformations of inputs prior to a standard classification or regression function. The key property of differential Gaussian processes is the warping of inputs through infinitely deep, but infinitesimal, differential fields, that generalise discrete layers into a dynamical system. We demonstrate state-of-the-art results that exceed the performance of deep Gaussian processes and neural networks
Motivation & Objective
- To address the limitations of discrete-layer deep networks and degeneracy in deep Gaussian processes by modeling transformations as continuous flows.
- To enable flexible, non-linear input warping in the original feature space without learning intermediate latent representations.
- To improve model capacity and generalization through stochastic differential equations with principled regularization via diffusion.
- To reduce the number of parameters required compared to deep Gaussian processes while maintaining or exceeding performance.
- To provide a more interpretable deep learning framework by enabling explicit analysis of transformation paths through the flow.
Proposed method
- The method models input transformations using stochastic differential equations (SDEs) that define continuous, smooth, and differentiable flows in the input space.
- Each data point is transformed along a continuous path governed by an SDE with drift and diffusion components, enabling infinitely deep, infinitesimal transformations.
- The SDEs are approximated using a sparse Gaussian process with inducing points in both space and time, enabling efficient inference.
- The model uses a continuous-time flow with temporal and spatial inducing points to parameterize the drift and diffusion fields of the SDE.
- The posterior is approximated via variational inference, minimizing a lower bound on the marginal likelihood with a structured variational distribution.
- The framework supports both non-temporal and temporal extensions, allowing for increased model capacity through longer integration times.
Experimental results
Research questions
- RQ1Can continuous-time flows modeled via SDEs outperform discrete-layer deep networks in predictive performance?
- RQ2Does modeling transformations directly in the input space via SDEs improve generalization and reduce overfitting compared to latent-space models?
- RQ3Can a principled Bayesian approach using SDEs achieve state-of-the-art results with fewer parameters than deep Gaussian processes?
- RQ4How does increasing the flow time (integration time) affect model capacity and performance?
- RQ5Can the continuous flow framework support interpretable decision paths by tracing individual data point transformations?
Key findings
- On the HIGGS and SUSY UCI classification benchmarks, the proposed DiffGP model achieves AUC scores of 0.878 and 0.842, respectively, matching or exceeding the best reported results of DGP and DNNs.
- On the Protein regression dataset, the model achieves state-of-the-art performance, with improved results over DGP, suggesting strong modeling of long-range correlations.
- Increasing the flow time from 1 to 10 significantly improves test error and likelihood, with performance saturating near T=10, indicating controlled capacity expansion.
- The model outperforms deep Gaussian processes on multiple regression benchmarks, including Concrete and Energy, with fewer inducing parameters.
- The temporal extension of the model achieves AUC of 0.878 on HIGGS and 0.846 on SUSY, matching the best DGP results with a more efficient parameterization.
- The model maintains strong performance on small datasets like Wine and Energy, where shallow GPs are optimal, indicating no overfitting despite increased capacity.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.