[论文解读] Hybrid Models with Deep and Invertible Features
本文提出一种神经混合模型(DIGLM),将深度可逆特征变换与广义线性模型相结合,使在一次前向传播中实现精确的联合密度 p(x, y) 与精确的预测分布 p(y|x),并具备有用的 OOD 检测与半监督学习。
We propose a neural hybrid model consisting of a linear model defined on a set of features computed by a deep, invertible transformation (i.e. a normalizing flow). An attractive property of our model is that both p(features), the density of the features, and p(targets | features), the predictive distribution, can be computed exactly in a single feed-forward pass. We show that our hybrid model, despite the invertibility constraints, achieves similar accuracy to purely predictive models. Moreover the generative component remains a good model of the input features despite the hybrid optimization objective. This offers additional capabilities such as detection of out-of-distribution inputs and enabling semi-supervised learning. The availability of the exact joint density p(targets, features) also allows us to compute many quantities readily, making our hybrid model a useful building block for downstream applications of probabilistic deep learning.
研究动机与目标
- Motivate the use of joint modeling of targets and inputs to improve robustness and enable out-of-distribution detection.
- Develop a neural hybrid that jointly learns p(x) and p(y|x) in a single forward pass via invertible transformations.
- Demonstrate exact inference for p(y|x) and p(x) and show benefits for semi-supervised learning and selective classification.
- Evaluate predictive accuracy, uncertainty, and OOD detection across classification and regression benchmarks.
提出的方法
- Define a joint model p(y, x) = p(y|x; β, φ) p(x; φ) where x is transformed by an invertible f with parameters φ and z = f(x).
- Use a generalized linear model (GLM) on the latent representation z as p(y|x; β, φ) with priors to enable exact or closed-form predictive inference in many cases.
- Share φ between the invertible generative model and the predictive GLM to couple generative and discriminative objectives.
- Train by maximizing the exact joint log-likelihood J(θ) = Σ log p(y, x; θ), enabling exact p(x) via the change-of-variables formula and exact p(y|x) in several settings.
- Introduce a weighted objective Jλ(θ) = Σ [ log p(y|x; β, φ) + λ log p(x; φ) ] to balance discriminative and generative components.
- Discuss semi-supervised learning by integrating out y for unlabeled x via p(x; φ), and selective classification by rejecting inputs with low p(x; φ) using a threshold τ.
- Provide a Bayesian treatment (B-DIGLM) by placing a prior on β and deriving a marginal likelihood that connects to Gaussian processes via a kernel k(xi, xj) = λ^{-1} f(xi; φ)^T f(xj; φ).
实验结果
研究问题
- RQ1Can a deep invertible transformation plus a GLM yield exact inference for both p(x) and p(y|x) in a single forward pass?
- RQ2Does sharing the invertible feature extractor φ between p(x) and p(y|x) maintain predictive performance while enabling reliable OOD detection?
- RQ3How does the DIGLM perform in semi-supervised settings where unlabeled x data are available?
- RQ4Can the model effectively reject out-of-distribution inputs using the generative density p(x; φ)?
- RQ5What are the advantages of a Bayesian DIGLM (B-DIGLM) and its relation to Gaussian processes?
主要发现
| Model | MNIST BPD (bits/dim) | MNIST error (%) | MNIST NLL | NotMNIST BPD (bits/dim) | NotMNIST NLL | NotMNIST Entropy |
|---|---|---|---|---|---|---|
| Discriminative (λ=0) | 81.80* | 0.67 | 0.082 | 87.74* | 29.27 | 0.130 |
| Hybrid (λ=0.01/D) | 1.83 | 0.73 | 0.035 | 5.84 | 2.36 | 2.300 |
| Hybrid (λ=1.0/D) | 1.26 | 2.22 | 0.081 | 6.13 | 2.30 | 2.300 |
| Hybrid (λ=10.0/D) | 1.25 | 4.01 | 0.145 | 6.17 | 2.30 | 2.300 |
- DIGLM achieves competitive predictive accuracy with the discriminative model while providing explicit p(x) and p(y|x) in one pass.
- The model can detect out-of-distribution inputs via p(x; φ) and improves uncertainty estimates, shown in MNIST and SVHN experiments.
- In MNIST, the hybrid model with nonzero λ yields better NLL and entropy on in- and out-of-distribution data, indicating improved OOD detection.
- On flight delay regression, the DIGLM achieves a notably better NLL than the state-of-the-art, indicating effective modeling of non-stationarity.
- Semi-supervised experiments show that unlabeled data improve decision boundaries and classification accuracy.
- A Bayesian interpretation links the marginal likelihood to a GP-like kernel, enabling connections to kernel methods and exact posterior computations in some settings.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。