QUICK REVIEW

[论文解读] Hybrid Models with Deep and Invertible Features

Eric Nalisnick, Akihiro Matsukawa|arXiv (Cornell University)|Feb 7, 2019

Generative Adversarial Networks and Image Synthesis被引用 34

一句话总结

本文提出一种神经混合模型（DIGLM），将深度可逆特征变换与广义线性模型相结合，使在一次前向传播中实现精确的联合密度 p(x, y) 与精确的预测分布 p(y|x)，并具备有用的 OOD 检测与半监督学习。

ABSTRACT

We propose a neural hybrid model consisting of a linear model defined on a set of features computed by a deep, invertible transformation (i.e. a normalizing flow). An attractive property of our model is that both p(features), the density of the features, and p(targets | features), the predictive distribution, can be computed exactly in a single feed-forward pass. We show that our hybrid model, despite the invertibility constraints, achieves similar accuracy to purely predictive models. Moreover the generative component remains a good model of the input features despite the hybrid optimization objective. This offers additional capabilities such as detection of out-of-distribution inputs and enabling semi-supervised learning. The availability of the exact joint density p(targets, features) also allows us to compute many quantities readily, making our hybrid model a useful building block for downstream applications of probabilistic deep learning.

研究动机与目标

Motivate the use of joint modeling of targets and inputs to improve robustness and enable out-of-distribution detection.
Develop a neural hybrid that jointly learns p(x) and p(y|x) in a single forward pass via invertible transformations.
Demonstrate exact inference for p(y|x) and p(x) and show benefits for semi-supervised learning and selective classification.
Evaluate predictive accuracy, uncertainty, and OOD detection across classification and regression benchmarks.

提出的方法

Define a joint model p(y, x) = p(y|x; β, φ) p(x; φ) where x is transformed by an invertible f with parameters φ and z = f(x).
Use a generalized linear model (GLM) on the latent representation z as p(y|x; β, φ) with priors to enable exact or closed-form predictive inference in many cases.
Share φ between the invertible generative model and the predictive GLM to couple generative and discriminative objectives.
Train by maximizing the exact joint log-likelihood J(θ) = Σ log p(y, x; θ), enabling exact p(x) via the change-of-variables formula and exact p(y|x) in several settings.
Introduce a weighted objective Jλ(θ) = Σ [ log p(y|x; β, φ) + λ log p(x; φ) ] to balance discriminative and generative components.
Discuss semi-supervised learning by integrating out y for unlabeled x via p(x; φ), and selective classification by rejecting inputs with low p(x; φ) using a threshold τ.
Provide a Bayesian treatment (B-DIGLM) by placing a prior on β and deriving a marginal likelihood that connects to Gaussian processes via a kernel k(xi, xj) = λ^{-1} f(xi; φ)^T f(xj; φ).

实验结果

研究问题

RQ1Can a deep invertible transformation plus a GLM yield exact inference for both p(x) and p(y|x) in a single forward pass?
RQ2Does sharing the invertible feature extractor φ between p(x) and p(y|x) maintain predictive performance while enabling reliable OOD detection?
RQ3How does the DIGLM perform in semi-supervised settings where unlabeled x data are available?
RQ4Can the model effectively reject out-of-distribution inputs using the generative density p(x; φ)?
RQ5What are the advantages of a Bayesian DIGLM (B-DIGLM) and its relation to Gaussian processes?

主要发现

Model	MNIST BPD (bits/dim)	MNIST error (%)	MNIST NLL	NotMNIST BPD (bits/dim)	NotMNIST NLL	NotMNIST Entropy
Discriminative (λ=0)	81.80*	0.67	0.082	87.74*	29.27	0.130
Hybrid (λ=0.01/D)	1.83	0.73	0.035	5.84	2.36	2.300
Hybrid (λ=1.0/D)	1.26	2.22	0.081	6.13	2.30	2.300
Hybrid (λ=10.0/D)	1.25	4.01	0.145	6.17	2.30	2.300

DIGLM achieves competitive predictive accuracy with the discriminative model while providing explicit p(x) and p(y|x) in one pass.
The model can detect out-of-distribution inputs via p(x; φ) and improves uncertainty estimates, shown in MNIST and SVHN experiments.
In MNIST, the hybrid model with nonzero λ yields better NLL and entropy on in- and out-of-distribution data, indicating improved OOD detection.
On flight delay regression, the DIGLM achieves a notably better NLL than the state-of-the-art, indicating effective modeling of non-stationarity.
Semi-supervised experiments show that unlabeled data improve decision boundaries and classification accuracy.
A Bayesian interpretation links the marginal likelihood to a GP-like kernel, enabling connections to kernel methods and exact posterior computations in some settings.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。