QUICK REVIEW

[论文解读] What Are Bayesian Neural Network Posteriors Really Like?

Pavel Izmailov, Sharad Vikram|arXiv (Cornell University)|Apr 29, 2021

Gaussian Processes and Bayesian Inference参考文献 66被引用 71

一句话总结

该论文在现代架构上使用全批量哈密顿蒙特卡洛（HMC）来研究真实贝叶斯神经网络后验，显示BNN在性能上可超越标准训练和集成，同时对先验、退火、域外分布、以及与SGMCMC和深度集成的比较提供了细腻的洞见。

ABSTRACT

The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.

研究动机与目标

研究真实贝叶斯后验是否相较于标准训练和深度集成具有优势。
评估多链与单条持续HMC链在近似后验方面的效果。
考察后验温度（冷后验与温后验）对BNN性能的作用。
评估对领域移位的鲁棒性并将HMC与更便宜的推理方法进行比较。
提供在实际部署全批量HMC方面的实用指南，并分享进一步学习的资源。

提出的方法

对 ResNet-20-FRN 与 CNN-LSTM 等架构应用全批量哈密顿蒙特卡洛以从BNN后验进行采样。
在 SPDM 设置中对数百个 TPU 设备进行并行采样以处理全批量梯度。
调优 HMC 超参数（轨迹长度、步长、链条数量）以实现良好的混合性和接受率。
在权重空间和函数空间可视化并分析后验几何以理解混合与模式连通性。
将 HMC 与 SGLD、MFVI、SGD 以及深度集成在分类和回归基准上进行比较。
评估预测性能、对数似然、校准误差以及异常检测指标。

实验结果

研究问题

RQ1单条较长的 HMC 链是否能提供与多条较短链相当的后验表示？
RQ2具有真实后验的贝叶斯神经网络在准确性和校准不确定性方面是否优于标准训练和深度集成？
RQ3后验温度（温度 T<1）对近似最优表现是否必要？
RQ4对不同先验（对角高斯、混合高斯、逻辑斯蒂先验）及先验尺度，BMA 的鲁棒性如何？
RQ5在领域移位和分布外设定下，用 HMC 训练的BNN相较于其他方法的泛化能力如何？

主要发现

通过全批量 HMC 获得的BNN在 CIFAR-10 与 IMDB 的准确性和对数似然上可超越标准训练与深度集成。
单条较长的 HMC 链能够在预测性能上提供与多条较短链相当的后验表示。
在温度 T=1 时，近似最优性能并不需要后验温度的冷却；在他们的设置下几乎没有冷后验效应的证据。
贝叶斯模型平均对先验选择和尺度具有鲁棒性；对角高斯、混合高斯和逻辑先验在性能上相近，其关键在于对比更多的是架构而非先验。
BNNs 在域内表现强劲，但在协变量漂移下的泛化能力出人意料地差；深度集成和 SGMCMC 可以泛化良好，但给出与 HMC 不同的预测分布。
深度集成和 SGLD 能提供与 HMC 相对接近的预测分布，在某些设定下优于标准变分推断。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。