[论文解读] How Good is the Bayes Posterior in Deep Neural Networks Really?
论文表明贝叶斯后验预测在深度网络中可能落后于SGD,并且冷后验(T<1)通常会带来显著更好的预测性能;它探讨解释并提供 SG-MCMC 精度的诊断。
During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
研究动机与目标
- 评估深度神经网络中的贝叶斯后验预测是否与基于SGD的预测相匹配或超过它。
- 证明带温度的(冷的)后验,T<1,能够提升预测性能,超越贝叶斯后验。
- 系统性评估解释冷后验的假设并为 SG-MCMC 的准确性开发诊断方法。
- 提供实用的指南和诊断,帮助理解在深度学习中何时贝叶斯后验是有益的。
提出的方法
- 通过 Langevin 动力学和 SG-MCMC 来公式化后验采样,以近似 p(θ|D)。
- 在 ResNet-20/CIFAR-10 和 CNN-LSTM/IMDB 任务上将贝叶斯后验预测与用 SGD 训练的基线进行比较。
- 通过温度 T<1 来降温后验以创建冷后验,并确定最佳范围(例如 T<<1)。
- 引入并应用用于 SG-MCMC 精度的诊断方法,包括动力学温度和构型温度。
- 使用预条件化和循环时间步进来提高 SG-MCMC 仿真保真度;将离散化步长 h 与 SGD 参数相关联。
实验结果
研究问题
- RQ1在标准深度学习基准上,T=1 的贝叶斯后验预测是否与基于 SGD 的模型一样好或更差?
- RQ2带温度 T<1 的冷后验是否能带来更优的预测性能,以及最佳温度范围是什么?
- RQ3哪些假设可以解释冷后验现象,哪些诊断可以将推断问题与先验或似然效应区分开?
- RQ4SG-MCMC 方法是否在准确地近似目标后验,以及小批量噪声或离散化等因素如何影响结果?
- RQ5先验和数据实践(如数据增强、 dropout)如何影响深度网络中的贝叶斯后验?
主要发现
- T=1 时的贝叶斯后验预测在 ResNet-20/CIFAR-10 与 CNN-LSTM/IMDB 任务上劣于 SGD。
- 温度 T<1 能带来显著更好的预测性能,最佳范围往往远低于 1(例如在某些实验中 IMDB 可低至 0.01–0.2,CIFAR-10 甚至低至 10^-4)。
- 带预条件化和循环时间步进的 SG-MCMC 可以准确模拟后验,支持推断精度并非冷后验的唯一解释。
- 来自糟糕先验或似然性违反的偏差本身不足以在各模型和数据集上充分解释冷后验效应。
- 先验预测分析表明标准正态先验(如 N(0,I))对于大网络可能信息量过大,表明先验选择会影响后验行为。
- 替代的后验概念(如 Masegosa 后验)在模型错配下可能比贝叶斯后验提供更鲁棒的目标。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。