QUICK REVIEW

[论文解读] A Simple Baseline for Bayesian Uncertainty in Deep Learning

Wesley J. Maddox, Timur Garipov|arXiv (Cornell University)|Feb 7, 2019

Gaussian Processes and Bayesian Inference参考文献 63被引用 181

一句话总结

SWAG 引入一个可扩展的高斯后验分布，用于神经网络权重，由 SWA 均值以及从 SGD 迭代中估计的低秩加对角协方差构成，实现贝叶斯模型平均和在视觉任务中对不确定性的更好校准。

ABSTRACT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.

研究动机与目标

动机：在深度学习中需要可靠的不确定性表示，以帮助高风险领域的决策。
提出一种可扩展的贝叶斯推断方法，利用 SGD 轨迹来近似网络权重的后验分布。
开发一个实用算法（SWAG），将 SWA 与低秩加对角协方差结合起来，形成一个高斯后验。
证明 SWAG 在视觉基准测试中能够给出良好校准的预测，以及具有竞争力或更优的不确定性估计。

提出的方法

在随机权重平均（SWA）的基础上，将 SWA 均值用作后验均值。
从 SGD 迭代的二阶矩的运行中估计对角协方差。
使用来自 SGD 迭代的最近 K 个偏差向量来构建低秩协方差。
形成一个高斯后验 N(theta_SWA, 1/2*(Sigma_diag + Sigma_low_rank))。
从该高斯分布中采样，以进行预测的贝叶斯模型平均。
提供一个在线过程，以在最小开销下更新并存储所需的统计量。

实验结果

研究问题

RQ1SGD 轨迹是否可以用来近似深度网络后验的局部几何？
RQ2基于 SWAG 的高斯后验是否在视觉任务中提供比现有基线更好的不确定性校准？
RQ3与 MC dropout 和 SGLD 等替代方法相比，SWAG 对域外检测和迁移学习是否有效？
RQ4在实践中，低秩加对角近似与仅对角协方差相比有何差异？
RQ5作为更广泛的基线，SWAG 是否能改善语言建模和回归基准的校准和预测性能？

主要发现

SWAG 在被 SGD 迭代所张成的子空间上，能够较好地捕捉后验的局部几何。
SWAG 提供了良好校准的不确定性估计，并在 CIFAR-10/100 和 ImageNet 上的测试对数似然高于若干基线。
SWAG 在不确定性校准方面优于包括 MC dropout、SGLD、KFAC-Laplace 和 SWA 在内的许多替代方法。
与若干竞争者相比，SWAG 提升了迁移学习性能和域外检测能力。
SWAG 在语言建模困惑度方面也有改进，在回归任务上也有竞争力的结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。