[论文解读] Do Deep Generative Models Know What They Don't Know?
本文表明,最先进的深度生成模型(包括基于流的、VAE 和 PixelCNN)对分布外数据给出更高的似然性(例如在用 CIFAR-10 训练时对 SVHN),从而挑战了将密度作为检测分布外数据的指标。它分析了流模型中的原因,并警告在未进一步理解的情况下,不应依赖密度估计来进行异常检测。
A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data. A plethora of work has demonstrated that it is easy to find or synthesize inputs for which a neural network is highly confident yet wrong. Generative models are widely viewed to be robust to such mistaken confidence as modeling the density of the input features can be used to detect novel, out-of-distribution inputs. In this paper we challenge this assumption. We find that the density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we find evidence of this phenomenon when pairing several popular image data sets: FashionMNIST vs MNIST, CelebA vs SVHN, ImageNet vs CIFAR-10 / CIFAR-100 / SVHN. To investigate this curious behavior, we focus analysis on flow-based generative models in particular since they are trained and evaluated via the exact marginal likelihood. We find such behavior persists even when we restrict the flows to constant-volume transformations. These transformations admit some theoretical analysis, and we show that the difference in likelihoods can be explained by the location and variances of the data and the model curvature. Our results caution against using the density estimates from deep generative models to identify inputs similar to the training distribution until their behavior for out-of-distribution inputs is better understood.
研究动机与目标
- 评估以密度为校准的深度生成模型是否能够检测来自与训练数据不同分布的输入。
- 检查为何基于流的模型有时会对分布外图像分配更高的密度。
- 分析似然性的贡献,以识别该现象的来源。
- 评估体积元素和恒定体积流在似然性行为中的作用。
提出的方法
- 在 FashionMNIST 与 MNIST 以及 CIFAR-10 与 SVHN(以及 CelebA、ImageNet 比较)上训练 Glow(基于流的)模型。
- 在同分布和分布外的测试集上计算并比较对数似然(每维比特数)。
- 将变换变量的似然分解为 log p(z) 与 log|det df/dx|,以识别贡献项。
- 研究恒定体积(CV)和非体积保持(NVP)变换以控制体积效应。
- 探索二阶分析,将似然差与数据协方差和模型曲率联系起来。
- 测试集成来评估该现象的鲁棒性。
实验结果
研究问题
- RQ1现代深度生成模型是否会把分布外数据分配更高的密度,与训练数据相比?
- RQ2似然性的哪些组成部分(潜在密度 vs 体积变化)驱动任何观察到的分布外密度优势?
- RQ3恒定体积流变换是否能消除或降低分布外似然性悖论?
- RQ4数据方差和模型曲率如何相互作用以产生对分布外集合的更高似然性?
- RQ5集成模型或图像变灰是否会影响分布内外数据之间的似然差距?
主要发现
- 基于流的模型、VAE 和 PixelCNN 可以将分布外数据(例如 SVHN)的似然性分配得比训练数据(例如 CIFAR-10)更高。
- 对于流模型,分布外效应在很大程度上来自体积项,而非潜在 p(z) 项。
- 恒定体积流并未消除这一现象;SVHN 仍可能具有比 CIFAR-10 更高的似然性。
- 二阶分析预测由于数据协方差和模型曲率差异,SVHN 的似然性更高。
- 将图像变灰(降低方差)会增加 CIFAR-10 与 SVHN 的似然性,与基于曲率的解释一致。
- 集成并未实质性地消除分布内外数据之间的似然差距。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。