QUICK REVIEW

[论文解读] On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty

Joost van Amersfoort, Lewis Smith|arXiv (Cornell University)|Feb 22, 2021

Gaussian Processes and Bayesian Inference参考文献 48被引用 56

一句话总结

本文识别了单次前向传播不确定性中的特征崩塌现象在深度核学习（DKL）中的存在，并提出 Deterministic Uncertainty Estimation (DUE)，通过具备双 Lipschitz 约束的特征提取器和诱导点高斯过程来实现，在保持快速性的同时获得卓越的不确定性估计。

ABSTRACT

Inducing point Gaussian process approximations are often considered a gold standard in uncertainty estimation since they retain many of the properties of the exact GP and scale to large datasets. A major drawback is that they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) promises a solution: a deep feature extractor transforms the inputs over which an inducing point Gaussian process is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that with no constraints, the DKL objective pushes "far-away" data points to be mapped to the same features as those of training-set points. With this insight we propose to constrain DKL's feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and other single forward pass uncertainty methods, while maintaining the speed and accuracy of standard neural networks.

研究动机与目标

在快速单次前向传播模型中，动机是需要可靠的不确定性估计。
诊断为何基于 DKL 的标准不确定性会因为特征崩塌而不可靠。
提出一种带约束的 DKL 方法（DUE），使用双 Lipschitz 的特征提取器以提高不确定性质量。
表明 DUE 在保持神经网络速度和准确性的同时，能够达到与之竞争甚至优于的不确定性表现。

提出的方法

分析在深度特征提取器无约束时，DKL 的特征崩塌。
通过残差连接和谱归一化对特征提取器施加双 Lipschitz 约束，以实现敏感性与平滑性。
在受约束的特征提取器之上置入一个深度高斯过程（诱导点），用于真正的非参数化不确定性估计。
从零开始端到端训练，采用实际简化（无预训练，诱导点数量少）。
使用 AUROC、准确度和预测不确定性指标，与 SNGP 及其他单前向传播方法进行比较。
提供实用的训练步骤（算法 1），包括通过 K-means 初始化诱导点和对谱归一化的调整。

实验结果

研究问题

RQ1特征崩塌在 DKL 中如何发生，以及它如何影响不确定性估计？
RQ2对特征提取器施加双 Lipschitz 约束是否能够缓解特征崩塌并提高 DKL 的不确定性质量？
RQ3基于 DKL 的单前向传播模型（带诱导点的 DUE）是否在标准不确定性基准和回归任务上优于现有方法？
RQ4DUE 是否可从零开始进行实际训练，并在速度与准确性方面与标准神经网络具有竞争力？
RQ5在 CIFAR-10 与 SVHN 的判别及因果/医学不确定性基准上，DUE 的表现如何？

主要发现

在未受约束的 DKL 中，特征崩塌会导致对分布外数据的高置信度，从而降低不确定性估计的有效性。
带双 Lipschitz 约束的特征提取器（带残差连接与谱归一化）能够缓解特征崩塌并改善不确定性行为。
DUE 通过在受约束的特征提取器之上使用诱导点高斯过程，达到与现有单前向传播方法相媲美甚至优于 CIFAR-10 对 SVHN 的不确定性表现。
DUE 在从零开始训练时开销极小（无预训练），且诱导点数量较少（如 10），对 CIFAR-10 的运行时接近于标准 softmax 模型。
DUE 在 CIFAR-10 vs SVHN 的不确定性任务和个性化医疗回归基准上优于其他单前向传播方法，同时显著快于集成方法。
诱导点高斯过程方法保留了非参数GP的性质，在训练外的区域仍能给出类似全 GP 的不确定性，与基于 RFF 的方法不同。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。