QUICK REVIEW

[论文解读] Dropout Inference in Bayesian Neural Networks with Alpha-divergences

Yingzhen Li, Yarin Gal|arXiv (Cornell University)|Mar 8, 2017

Adversarial Robustness in Machine Learning参考文献 39被引用 107

一句话总结

这篇论文重新参数化 alpha-divergence 目标以实现实际的、基于 dropout 的贝叶斯神经网络推断，从而提升不确定性估计和对抗样本的鲁棒性。

ABSTRACT

To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI's KL objective, which are able to avoid VI's uncertainty underestimation. But these are hard to use in practice: existing techniques can only use Gaussian approximating distributions, and require existing models to be changed radically, thus are of limited use for practitioners. We propose a re-parametrisation of the alpha-divergence objectives, deriving a simple inference technique which, together with dropout, can be easily implemented with existing models by simply changing the loss of the model. We demonstrate improved uncertainty estimates and accuracy compared to VI in dropout networks. We study our model's epistemic uncertainty far away from the data using adversarial images, showing that these can be distinguished from non-adversarial images by examining our model's uncertainty.

研究动机与目标

Motivating the need for better uncertainty estimation in Bayesian neural networks (BNNs).
Propose a practical alpha-divergence based inference that works with standard dropout and existing architectures.
Demonstrate improved uncertainty as well as predictive accuracy over standard dropout VI across tasks.
Assess epistemic uncertainty far from data and its relation to adversarial examples.

提出的方法

Reformulate BB-α energy to enable dropout-based approximate inference without changing model architectures.
Use a reparameterisation with cavity distributions to derive a tractable objective compatible with dropout (equation 7).
Define the MC objective L̃α(q) as KL[q||p0] + const − (1/α) sum_n log-sum-exp[−α l(y_n, f^ω_k(x_n))] with K samples.
Specialise to dropout by sampling multiple stochastic forward passes, yielding a practical loss (equation 9 for classification, equation 10 for regression).
Provide a concrete dropout-BB-α objective that raises outputs to the power α and averages over MC samples.
Show that α controls the trade-off between predictive likelihood optimization (α≈1) and variational free energy (α→0).

实验结果

研究问题

RQ1 Does alpha-divergence based dropout inference yield better calibrated uncertainty than standard dropout VI?
RQ2 How do different α values affect regression and classification performance on benchmark datasets?
RQ3 Can dropout-BB-α improve robustness and detect adversarial inputs via epistemic uncertainty?
RQ4 What are the practical training-time implications compared to VI and other Bayesian methods?
RQ5 How does the approach generalize to CNNs and larger architectures?

主要发现

Non-VI α values (e.g., α=0.5 or α=1) improve predictive log-likelihood and often maintain competitive RMSE compared to VI in regression.
In MNIST classification, α=0.5 (Hellinger value) yields best test RMSE and matches EP value in log-likelihood for fully connected networks; VI (α=0) underperforms on these metrics.
For CNNs on MNIST, VI-α (α=0) can perform comparably to α=0.5 and often near α=1 in log-likelihood, with improvements in accuracy.
The approach enables MC dropout with a simple loss reformulation, and training time is competitive with VI.
Uncertainty increases for adversarial MNIST images, enabling separation from non-adversarial samples via epistemic uncertainty.
The experiments show the method outperforms a Gaussian VI baseline and is competitive with HMC and sparse GP in regression tasks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。