QUICK REVIEW

[论文解读] Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Michael W. Dusenberry, Ghassen Jerfel|arXiv (Cornell University)|May 14, 2020

Adversarial Robustness in Machine Learning参考文献 46被引用 33

一句话总结

论文提出使用秩-1贝叶斯神经网络与混合后验，以实现最先进的不确定性量化和可扩展性，在 ImageNet、CIFAR 和 MIMIC-III 数据集上超越基线，同时比集成模型使用更少的参数。

ABSTRACT

Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants.

研究动机与目标

在大规模下解决贝叶斯神经网络的欠拟合和低效问题。
通过参数高效的方法实现强不确定性量化。
利用秩-1子空间参数化以实现可扩展的贝叶斯推断。
调查混合后验以在最小内存开销下捕捉多模态。

提出的方法

将每个权重矩阵 W 参数化为 W' = W ∘ (r s^T)，其中 r 和 s 是低维向量（秩-1因子分解）。
对 r 和 s 进行变分推断，同时将 W 视为确定性参数（秩-1贝叶斯扰动）。
在 r 和 s 上放置分层先验以引入结构化权重先验，并实现稀疏性和鲁棒性（如高斯、柯西、逆伽玛分布）。
使用秩-1因子的混合后验以在较小内存开销下捕捉多模态（如实验中的混合度 K=4）。
比较 log-mixture 与 average-log-likelihood 的训练，并分析训练动态与在分布偏移下的泛化。

实验结果

研究问题

RQ1结合变分推断的秩-1权重扰动能否在大规模下提供有竞争力的准确性和不确定性校准？
RQ2秩-1因子的分层先验是否提升鲁棒性和离分布表现？
RQ3混合分量数量和后验形式对性能与参数效率有何影响？
RQ4在多样性、NLL 和校准方面，秩-1贝叶斯推断与深度集成（deep ensembles）和 BatchEnsemble 的比较如何？
RQ5对于秩-1贝叶斯网络，log-mixture似然界在训练或评估中是否更有利？

主要发现

具有多模态后验的秩-1 BNN 在 ImageNet、CIFAR、MIMIC-III 基准测试中的 NLL、准确率和校准方面达到最先进水平。
秩-1因子的混合后验在内存开销极小的情况下带来显著提升（例如 ResNet-50 在 K=10 时参数增加 0.4%）。
秩-1因子上的柯西先验在分布偏移下相比高斯先验，改善泛化和不确定性校准。
秩-1 BNN 在使用远少于参数的情况下，优于 BatchEnsemble 并且在相同或更高的准确率下显示出更高的集成多样性。
理论结果显示秩-1扰动能在全连接网络中匹配全秩扰动的局部方差结构，支持该方法的表达能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。