Skip to main content
QUICK REVIEW

[论文解读] Activation-Space Uncertainty Quantification for Pretrained Networks

Richard Bergna, Stefan Depeweg|arXiv (Cornell University)|Feb 16, 2026
Gaussian Processes and Bayesian Inference被引用 0
一句话总结

GAPA 后验方法通过用高斯过程激活替代激活,获得激活空间的 epistemic 不确定性,同时保持主干预测,从而实现对预训练网络的一次前向不确定性传播。

ABSTRACT

Reliable uncertainty estimates are crucial for deploying pretrained models; yet, many strong methods for quantifying uncertainty require retraining, Monte Carlo sampling, or expensive second-order computations and may alter a frozen backbone's predictions. To address this, we introduce Gaussian Process Activations (GAPA), a post-hoc method that shifts Bayesian modeling from weights to activations. GAPA replaces standard nonlinearities with Gaussian-process activations whose posterior mean exactly matches the original activation, preserving the backbone's point predictions by construction while providing closed-form epistemic variances in activation space. To scale to modern architectures, we use a sparse variational inducing-point approximation over cached training activations, combined with local k-nearest-neighbor subset conditioning, enabling deterministic single-pass uncertainty propagation without sampling, backpropagation, or second-order information. Across regression, classification, image segmentation, and language modeling, GAPA matches or outperforms strong post-hoc baselines in calibration and out-of-distribution detection while remaining efficient at test time.

研究动机与目标

  • 为预训练网络提供认识性不确定性,而无需重新训练或采样。
  • 在保持主干点预测的同时增加激活空间不确定性。
  • 为现代架构开发可扩展的离线激活缓存与局部诱导点条件化。
  • 推导通过深度网络的确定性单次传播方差。
  • 在回归、分类、分割和语言建模等任务上对 GAPA 进行经验验证。

提出的方法

  • 用高斯过程激活替代确定性激活,其后验均值与原始激活相匹配。
  • 在训练数据上进行前向传播以缓存预激活,并使用诱导点和局部最近邻条件化将 GP 激活条件化在此缓存上。
  • 通过包含闭式解的方差传播规则在冻结网络中传播得到的激活空间方差。
  • 为可扩展性维持对角线的(神经元级别的)激活协方差。
  • 对非线性激活使用基于 delta-方法的矩传播,对堆叠层使用 Noise 输入 GP 校正。
  • 超参数在后验从激活统计量中固定,无需重新训练或标签。
Figure 1 : Comparison of uncertainty quantification methods on a toy binary classification task. Left to right : MAP (deterministic backbone), MC Dropout, Last-Layer Laplace, and GAPA (ours). Background shading indicates predictive confidence (darker = more confident); orange/yellow points show the
Figure 1 : Comparison of uncertainty quantification methods on a toy binary classification task. Left to right : MAP (deterministic backbone), MC Dropout, Last-Layer Laplace, and GAPA (ours). Background shading indicates predictive confidence (darker = more confident); orange/yellow points show the

实验结果

研究问题

  • RQ1激活空间不确定性是否可以在不重新训练或采样的情况下为预训练网络提供准确的认识性估计?
  • RQ2在测试时如何高效地对现代架构进行 GP 激活的条件化与传播?
  • RQ3与基线方法相比,基于 GAPA 的不确定性估计是否在回归、分类、分割、语言建模等任务上改善了校准性与异常检测?
  • RQ4诱导集规模和局部性(KNN)对性能和计算的影响是什么?
  • RQ5通过 GAPA 在深度网络中的方差传播对后验不确定性的近似程度如何?

主要发现

ModelAirline NLLAirline CRPSAirline CQMYear NLLYear CRPSYear CQMTaxi NLLTaxi CRPSTaxi CQM
MAP5.12118.6950.1483.6735.0230.1343.7553.7550.211
LLA Diag5.12518.6480.1433.6474.9170.0883.7223.9900.257
LLA KFAC5.12718.6310.1423.6484.9150.0863.7063.9860.256
LLA*5.12718.6310.1413.6484.9150.0863.7263.9850.256
LLA*KFAC5.12718.6310.1413.6484.9140.0863.7263.9850.256
ELLA5.38821.6710.4134.0206.0490.4243.8853.6800.219
VaLLA1004.96318.8140.0993.5155.0040.0473.2353.9990.149
VaLLA2004.96518.7880.0983.4854.9700.0413.2323.9790.142
Dropout5.10219.0660.9383.6895.1280.9393.8494.5920.951
Ensemble5.05318.2050.9333.6394.8330.9383.6313.3840.961
GAPA4.94618.0680.1033.4704.6630.0143.1124.0350.104
  • GAPA 通过匹配原始激活均值来实现平均值保持的不确定性,同时引入激活空间的认识性方差。
  • 使用离线诱导点和局部 KNN 条件化的两阶段、可扩展推断,在测试时实现常数时间的每查询方差计算。
  • 确定性方差传播通过深度结构实现,在校准性和 OOD 检测方面对多任务具有竞争力或优越性。
  • GAPA 在回归基准 Airline、Year、Taxi 上达到最佳负对数似然(NLL)以及最佳或接近最佳的校准指标,超越了许多基线。
  • 在使用 ResNet 主干的 MNIST/Fashion-MNIST 与 CIFAR-10 的分类任务中,GAPA 实现了强大的 OOD 检测,测试时比采样方法或完整 GP 方法更快,且通常接近 MAP 的运行时间。
  • 在语言建模领域,使用 GAPA 对 Transformer 的前部进行建模,获得可用的不确定性指标且无需额外前向传播。
Figure 2 : GAPA overview. Top: GAPA leaves the network’s point predictions unchanged (mean-preserving activations) while propagating an additional epistemic variance signal to the output. Bottom left: deterministic $\tanh$ activation; orange points denote cached training activations. Bottom right: G
Figure 2 : GAPA overview. Top: GAPA leaves the network’s point predictions unchanged (mean-preserving activations) while propagating an additional epistemic variance signal to the output. Bottom left: deterministic $\tanh$ activation; orange points denote cached training activations. Bottom right: G

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。