Skip to main content
QUICK REVIEW

[论文解读] User-friendly introduction to PAC-Bayes bounds

Pierre Alquier|arXiv (Cornell University)|Oct 21, 2021
Neural Networks and Applications被引用 24
一句话总结

本教程介绍 PAC-Bayes 界限,概述从简单到高级的界限及其通过 Catoni’s 方法和 Donsker-Varadhan 的推导,以及它们如何应用于随机化和聚合预测,并与深度学习相关联。

ABSTRACT

Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of D. McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of O. Catoni that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, "(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by G. Dziugaite and D. Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.

研究动机与目标

  • 为对该领域新入门的研究人员提供对 PAC-Bayes 理论的易于理解的介绍。
  • 解释 PAC-Bayes 界限如何将联合界限论证扩展到无限的预测器空间。
  • 展示如何使用 Catoni’s 框架推导和解释经验与 Oracle 的 PAC-Bayes 界限。

提出的方法

  • 在预测器空间上固定先验分布,并推导对数据相关后验在整个范围内成立的风险界限。
  • 使用 Hoeffding’s 不等式和 Markov/Chernoff 界来界定经验风险与真实风险之间的偏差。
  • 应用 Donsker-Varadhan 的变分公式得到 Gibbs-后验的最小化问题。
  • 给出有限集合和一般预测器集合的显式界限,包括 Gibbs 后验的表述。
  • 讨论扩展到聚合与随机化预测器,以及非独立同分布或重尾情形的概述。

实验结果

研究问题

  • RQ1如何从简单的集中不等式推导出 PAC-Bayes 界限?
  • RQ2KL 散度在将界限从有限预测器空间扩展到无限空间中的作用是什么?
  • RQ3在 PAC-Bayes 框架下,Gibbs 后验如何作为风险最小化的数据相关分布出现?
  • RQ4PAC-Bayes 界限对经验风险最小化和预测器聚合有何影响?
  • RQ5经验性 PAC-Bayes 界限与 Oracle PAC-Bayes 界限及快速收敛率之间有何关系?

主要发现

  • 一个简单的 PAC-Bayes 界限(Catoni’s 界限)将任何后验下的平均风险上界为经验风险加上涉及到先验的 KL 发散的项。
  • Gibbs 后验在 Corollary 2.3 的意义上最小化了 PAC-Bayes 界限,将经验风险、先验与通过 KL 发散体现的复杂度联系起来。
  • 对于有限 Θ,界限恢复了类似 ERM 的形式,带有 log( M ) 项,指导 λ 的选择并导出显式收敛率。
  • 经验界限可用于证明泛化性并激励 ERM,而 Oracle 界限提供数据增长时可实现性能的理论极限。
  • 该框架为深度学习界限(非空洞界限和数据相关先验)搭建桥梁,并与其他相关方法(MI 界、贝叶斯后验)相关联。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。