QUICK REVIEW

[论文解读] Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes

Peter Grünwald, Nishant A. Mehta|arXiv (Cornell University)|May 1, 2016

Machine Learning and Algorithms参考文献 62被引用 33

一句话总结

该论文在重尾分布下，为一般无界损失函数（如对数损失和平方损失）建立了快速的超额风险收敛速率。通过引入 v-GRIP 和观测条件，控制超额损失的下尾和上尾，使得经验风险最小化、MDL 和 η-广义贝叶斯估计器即使在模型误设情况下也能实现快速收敛速率。

ABSTRACT

We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $η$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate $η$ is set correctly. For general loss functions, our bounds rely on two separate conditions: the $v$-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter $v$ in the $v$-GRIP conditions determines the achievable rate and is akin to the exponent in the Tsybakov margin condition and the Bernstein condition for bounded losses, which the $v$-GRIP conditions generalize; favorable $v$ in combination with small model complexity leads to $ ilde{O}(1/n)$ rates. The witness condition allows us to connect the excess risk to an "annealed" version thereof, by which we generalize several previous results connecting Hellinger and Rényi divergence to KL divergence.

研究动机与目标

将统计学习中的快速收敛速率扩展至具有潜在重尾分布的一般无界损失函数。
克服先前理论对有界损失或强条件（如 Bernstein 条件）的依赖。
统一并推广在模型误设下 ERM、MDL 和广义贝叶斯方法的快速收敛结果。
为学习率 η 的广义贝叶斯和 MDL 估计器提供最优的超额风险界。
阐明在弱无界损失假设下，PAC-Bayesian 方法与广义贝叶斯方法之间的联系。

提出的方法

引入 v-GRIP 条件以控制超额损失的下尾，推广了 Tsybakov 边际条件和 Bernstein 条件至无界损失情形。
提出观测条件以控制超额损失的上尾，实现超额风险与退火超额风险之间的联系。
推导适用于任意估计器的一般超额风险界，且该界对 ERM、MDL 和 η-广义贝叶斯估计器最优。
利用观测条件将 Rényi 散度与 KL 散度之间的联系推广至无界设定。
将该界应用于对数损失和平方损失，表明在模型误设下，η-广义贝叶斯估计器可实现收敛速率。
证明 v-GRIP 和观测条件在 Bernstein 条件不成立时仍可成立，尤其在无界超额损失情形下。

实验结果

研究问题

RQ1在重尾分布下，能否为一般无界损失函数实现快速收敛速率？
RQ2哪些条件可推广 Tsybakov 边际条件和 Bernstein 条件以适用于无界损失？
RQ3v-GRIP 和观测条件如何在无有界超额损失时实现快速收敛速率？
RQ4在何种设定下，广义贝叶斯和 MDL 估计器在模型误设下仍能实现快速收敛速率？
RQ5在弱无界损失假设下，能否形式化 PAC-Bayesian 与广义贝叶斯方法之间的联系？

主要发现

v-GRIP 条件控制超额损失的下尾，且对无界损失情形推广了 Tsybakov 边际条件和 Bernstein 条件。
观测条件控制超额损失的上尾，使 Rényi 散度与 KL 散度之间的联系可推广至无界设定。
对于对数损失，该界表明在使用广义 Hellinger 距离时，η-广义贝叶斯推断在模型误设下可实现快速收敛速率。
当 v 有利且模型复杂度较低时，即使在无界损失情形下，该界仍可达到 Õ(1/n) 收敛速率。
v-GRIP 和观测条件可在 Bernstein 条件不成立的情况下成立，例如在具有无界均值的正态位置族中。
该结果适用于一般估计器（包括 ERM、MDL 和 η-广义贝叶斯），并可推广至可数个模型的并集。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。