QUICK REVIEW

[论文解读] The Case for Bayesian Deep Learning

Andrew Gordon Wilson|arXiv (Cornell University)|Jan 29, 2020

Gaussian Processes and Bayesian Inference参考文献 41被引用 66

一句话总结

本论文认为在神经网络权重上进行边缘化（Bayesian model averaging）比标准优化为深层网络提供更好的校准和准确性，深度集成作为近似贝叶斯边缘化的实现，以及在函数空间中对先验编码归纳偏差。

ABSTRACT

The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.

研究动机与目标

动机化贝叶斯边缘化作为深度神经网络中优于优化的替代方法。
解释深度集成如何与贝叶斯模型平均相关，以及为什么它们可被视为近似边缘化。
主张函数空间中的先验以及神经网络架构的归纳偏差的重要性。
突出可扩展的贝叶斯深度学习的实际进展与挑战，同时与 MAP 训练进行对比。

提出的方法

呈现贝叶斯模型平均 p(y|x,D) = ∫ p(y|x,w) p(w|D) dw 作为预测分布。
论证深度网络的不充分性导致后验分布过于扩散，其中边缘化有助于校准并提高准确性。
将 deep ensembles 与近似后验样本联系起来，强调多样性以避免在模型平均中的冗余。
讨论由结构化模型（如 CNNs）引发的函数空间先验及参数空间先验的作用。
综述可扩展的贝叶斯深度学习方法和技术（如基于集成的、子空间推断和 MCMC）以应对高维后验。

实验结果

研究问题

RQ1与传统的 MAP 优化相比，贝叶斯边缘化如何影响深度神经网络的校准和准确性？
RQ2深度集成是否可被解释为近似贝叶斯边缘化？在何种条件下它们有效？
RQ3函数空间先验和神经归纳偏置在贝叶斯深度学习中扮演什么角色？
RQ4存在哪些可扩展的贝叶斯推断方法用于深网络，它们的实际效益与局限性是什么？

主要发现

贝叶斯模型平均能够捕捉认知不确定性并提高深度网络的预测校准和准确性。
深度集成通过在吸引盆地中探索多样且高性能的解来近似贝叶斯边缘化。
由 CNN 等架构引导的结构化函数空间先验为泛化提供有益的归纳偏差。
损失景观的平坦区域对应多样的良好解，这在高维中提升贝叶斯边缘化的有效性。
最近的实用贝叶斯深度学习方法在保持可扩展性的同时在准确性和校准方面有所提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。