QUICK REVIEW

[论文解读] Variational Bayesian dropout: pitfalls and fixes

Jiri Hron, Alexander Matthews|arXiv (Cornell University)|Jul 5, 2018

Gaussian Processes and Bayesian Inference参考文献 3被引用 23

一句话总结

本文指出了变分贝叶斯Dropout中的关键理论缺陷，尤其是使用不当先验和奇异变分近似导致标准贝叶斯推断失效的问题。为解决这些问题，作者提出了准KL（QKL）散度——一种新颖的变分目标，可在真实后验与近似后验支持不匹配的情况下，仍一致地近似高维分布，并在退化高斯近似任务中证明其等价于主成分分析（PCA）。

ABSTRACT

Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; from undefined or pathological behaviour of the true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the approximating distribution relative to the true posterior. Our analysis of the improper log uniform prior used in variational Gaussian dropout suggests the pathologies are generally irredeemable, and that the algorithm still works only because the variational formulation annuls some of the pathologies. To address the singularity issue, we proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a simple practical example which shows that the QKL-optimal approximation of a full rank Gaussian with a degenerate one naturally leads to the Principal Component Analysis solution.

研究动机与目标

诊断变分贝叶斯Dropout中的理论不一致问题，特别是由不当先验和奇异变分近似引起的缺陷。
解释尽管存在这些理论缺陷，该算法为何在实践中仍表现良好。
提出一种新的变分推断目标，以解决真实后验与近似后验之间的奇异性问题。
在后验与近似后验支持不重叠的情况下，建立标准KL散度的合理替代方案。
通过理论分析和一个涉及高斯近似的具体实例，展示新目标的实际效用。

提出的方法

提出准KL（QKL）散度作为变分推断的极限形式，即使在标准KL散度因奇异性而未定义时，QKL仍保持良好定义。
将QKL推导为现有方法的推广，表明先前工作中提出的补救措施（如Gal & Ghahramani, 2016）是QKL的特例。
使用测度论工具，包括控制收敛定理和测度在子空间上的限制，证明离散近似收敛到连续期望。
将QKL应用于用退化高斯分布近似满秩高斯分布的问题，表明最优解对应于主成分分析（PCA）。
在高斯近似背景下，推导出QKL目标的解析梯度，使标准变分推断技术可应用于优化。
证明QKL目标的最优解在极限下收敛于PCA，从而在变分推断与降维之间建立理论联系。

实验结果

研究问题

RQ1尽管使用了不当先验和奇异变分近似，变分贝叶斯Dropout为何仍能取得良好的经验结果？
RQ2当近似后验的支持维度低于真实后验时，标准KL散度在变分推断中存在哪些理论局限性？
RQ3能否构建一种新的变分目标，使其在存在此类奇异性时仍保持良好定义且一致？
RQ4所提出的准KL（QKL）散度与现有推断目标有何关系？其理论性质是什么？
RQ5在特定极限情况下，QKL目标是否能恢复已知的统计方法，如PCA？

主要发现

标准变分贝叶斯Dropout框架在理论上站不住脚，原因在于不当先验和奇异变分近似，导致标准贝叶斯解释失效。
变分高斯Dropout中使用的对数均匀先验导致后验分布不当，使稀疏性诱导本质上非贝叶斯。
当近似后验的支持维度低于真实后验时，标准KL散度会变得未定义，这在高维设置中十分常见。
提出准KL（QKL）散度作为良好定义的极限目标，解决了奇异性问题，并推广了现有方法。
在用退化高斯分布近似满秩高斯分布的情况下，QKL最优解收敛于主成分分析（PCA）解，表明变分推断与经典降维之间存在理论上的合理联系。
随着数据点数量的增加，QKL目标的最优解在Frobenius/欧几里得范数下收敛于PCA，证实了其一致性和理论基础。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。