[论文解读] Generalized Variational Inference: Three arguments for deriving new Posteriors
本论文将贝叶斯推断重新表述为无限维优化,证明在其有限变分族内标准 VI 的最优性,并引入 Rule of Three 与 Generalized Variational Inference 以解决先验、似然以及计算之间的不一致。
We advocate an optimization-centric view on and introduce a novel generalization of Bayesian inference. Our inspiration is the representation of Bayes' rule as infinite-dimensional optimization problem (Csiszar, 1975; Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an optimality result of standard Variational Inference (VI): Under the proposed view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is preferable to alternative approximations of the Bayesian posterior. Next, we argue for generalizing standard Bayesian inference. The need for this arises in situations of severe misalignment between reality and three assumptions underlying standard Bayesian inference: (1) Well-specified priors, (2) well-specified likelihoods, (3) the availability of infinite computing power. Our generalization addresses these shortcomings with three arguments and is called the Rule of Three (RoT). We derive it axiomatically and recover existing posteriors as special cases, including the Bayesian posterior and its approximation by standard VI. In contrast, approximations based on alternative ELBO-like objectives violate the axioms. Finally, we study a special case of the RoT that we call Generalized Variational Inference (GVI). GVI posteriors are a large and tractable family of belief distributions specified by three arguments: A loss, a divergence and a variational family. GVI posteriors have appealing properties, including consistency and an interpretation as approximate ELBO. The last part of the paper explores some attractive applications of GVI in popular machine learning models, including robustness and more appropriate marginals. After deriving black box inference schemes for GVI posteriors, their predictive performance is investigated on Bayesian Neural Networks and Deep Gaussian Processes, where GVI can comprehensively improve upon existing methods.
研究动机与目标
- 以优化为中心的视角来动机化贝叶斯推断,并展示贝叶斯公式如何被表述为一个无限维优化问题。
- 引入 Rule of Three (RoT) 以放宽标准贝叶斯推断的三个核心假设:先验、似然和计算能力。
- 将 Generalized Variational Inference (GVI) 定义为一个可处理的 RoT 特例,并讨论其理论属性与计算。
- 展示 GVI 如何在大规模模型如贝叶斯神经网络和深度高斯过程中提供鲁棒推断和改进的边际分布。
提出的方法
- 将后验推断表述为对概率测度的优化,包含三个参数:损失、散度,以及可行解空间(RoT).
- 证明标准贝叶斯后验作为特定目标的解而出现,以及 VI 对应于在有限变分族内对该目标的最优求解。
- 将 GVI 定义为一个特殊的 RoT 情况,其中可行集合为变分族,使得在可替代损失与散度下实现可处理的推断。
- 发展理论性质包括一致性以及将 GVI 解释为近似 ELBO,并提出用于计算的黑箱 BBGVI。
- 提供一个分类法,将 RoT/GVI 与 Gibbs 后验、温和后验,以及 PAC-Bayesian 方法联系起来。
实验结果
研究问题
- RQ1贝叶斯推断如何被重新表述为一个无限维优化问题,以及这会给出标准 VI 的何种最优性结果?
- RQ2Rule of Three 如何通过放宽先验、似然和计算来推广贝叶斯推断,以及哪些现有方法作为特例被回收?
- RQ3What is Generalized Variational Inference (GVI),以及它的理论属性和实际计算策略?
- RQ4GVI 是否能够在如贝叶斯神经网络和深度高斯过程等大规模模型中提高鲁棒性和边际准确性?
主要发现
- 标准 VI 相对于无限维贝叶斯目标,在其有限变分族内是最优的。
- The RoT 提供了一个有原则性的框架来放宽先验、似然和计算假设,统一了现有的广义贝叶斯方法。
- GVI 提供了一个由损失、散度和变分族定义的大而可处理的后验族,具有理论保证并解读为近似 ELBO。
- GVI 能解决鲁棒性和边际方差问题,且启用黑箱推断方案扩大了对复杂模型的适用性。
- 在贝叶斯神经网络和深度高斯过程的应用中,通过解决与标准贝叶斯假设不一致的问题显示出性能改进。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。