Skip to main content
QUICK REVIEW

[论文解读] Hybrid Models with Deep and Invertible Features

Eric Nalisnick, Akihiro Matsukawa|arXiv (Cornell University)|Feb 7, 2019
Generative Adversarial Networks and Image Synthesis被引用 34
一句话总结

本文提出一种神经混合模型(DIGLM),将深度可逆特征变换与广义线性模型相结合,使在一次前向传播中实现精确的联合密度 p(x, y) 与精确的预测分布 p(y|x),并具备有用的 OOD 检测与半监督学习。

ABSTRACT

We propose a neural hybrid model consisting of a linear model defined on a set of features computed by a deep, invertible transformation (i.e. a normalizing flow). An attractive property of our model is that both p(features), the density of the features, and p(targets | features), the predictive distribution, can be computed exactly in a single feed-forward pass. We show that our hybrid model, despite the invertibility constraints, achieves similar accuracy to purely predictive models. Moreover the generative component remains a good model of the input features despite the hybrid optimization objective. This offers additional capabilities such as detection of out-of-distribution inputs and enabling semi-supervised learning. The availability of the exact joint density p(targets, features) also allows us to compute many quantities readily, making our hybrid model a useful building block for downstream applications of probabilistic deep learning.

研究动机与目标

  • Motivate the use of joint modeling of targets and inputs to improve robustness and enable out-of-distribution detection.
  • Develop a neural hybrid that jointly learns p(x) and p(y|x) in a single forward pass via invertible transformations.
  • Demonstrate exact inference for p(y|x) and p(x) and show benefits for semi-supervised learning and selective classification.
  • Evaluate predictive accuracy, uncertainty, and OOD detection across classification and regression benchmarks.

提出的方法

  • Define a joint model p(y, x) = p(y|x; β, φ) p(x; φ) where x is transformed by an invertible f with parameters φ and z = f(x).
  • Use a generalized linear model (GLM) on the latent representation z as p(y|x; β, φ) with priors to enable exact or closed-form predictive inference in many cases.
  • Share φ between the invertible generative model and the predictive GLM to couple generative and discriminative objectives.
  • Train by maximizing the exact joint log-likelihood J(θ) = Σ log p(y, x; θ), enabling exact p(x) via the change-of-variables formula and exact p(y|x) in several settings.
  • Introduce a weighted objective Jλ(θ) = Σ [ log p(y|x; β, φ) + λ log p(x; φ) ] to balance discriminative and generative components.
  • Discuss semi-supervised learning by integrating out y for unlabeled x via p(x; φ), and selective classification by rejecting inputs with low p(x; φ) using a threshold τ.
  • Provide a Bayesian treatment (B-DIGLM) by placing a prior on β and deriving a marginal likelihood that connects to Gaussian processes via a kernel k(xi, xj) = λ^{-1} f(xi; φ)^T f(xj; φ).

实验结果

研究问题

  • RQ1Can a deep invertible transformation plus a GLM yield exact inference for both p(x) and p(y|x) in a single forward pass?
  • RQ2Does sharing the invertible feature extractor φ between p(x) and p(y|x) maintain predictive performance while enabling reliable OOD detection?
  • RQ3How does the DIGLM perform in semi-supervised settings where unlabeled x data are available?
  • RQ4Can the model effectively reject out-of-distribution inputs using the generative density p(x; φ)?
  • RQ5What are the advantages of a Bayesian DIGLM (B-DIGLM) and its relation to Gaussian processes?

主要发现

ModelMNIST BPD (bits/dim)MNIST error (%)MNIST NLLNotMNIST BPD (bits/dim)NotMNIST NLLNotMNIST Entropy
Discriminative (λ=0)81.80*0.670.08287.74*29.270.130
Hybrid (λ=0.01/D)1.830.730.0355.842.362.300
Hybrid (λ=1.0/D)1.262.220.0816.132.302.300
Hybrid (λ=10.0/D)1.254.010.1456.172.302.300
  • DIGLM achieves competitive predictive accuracy with the discriminative model while providing explicit p(x) and p(y|x) in one pass.
  • The model can detect out-of-distribution inputs via p(x; φ) and improves uncertainty estimates, shown in MNIST and SVHN experiments.
  • In MNIST, the hybrid model with nonzero λ yields better NLL and entropy on in- and out-of-distribution data, indicating improved OOD detection.
  • On flight delay regression, the DIGLM achieves a notably better NLL than the state-of-the-art, indicating effective modeling of non-stationarity.
  • Semi-supervised experiments show that unlabeled data improve decision boundaries and classification accuracy.
  • A Bayesian interpretation links the marginal likelihood to a GP-like kernel, enabling connections to kernel methods and exact posterior computations in some settings.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。