Skip to main content
QUICK REVIEW

[论文解读] On integral probability metrics, ϕ-divergences and binary classification

Bharath K. Sriperumbudur, Kenji Fukumizu|ArXiv.org|Jan 18, 2009
Statistical Mechanics and Entropy参考文献 63被引用 93
一句话总结

本文建立了积分概率度量(IPMs)与二分类之间的新联系,表明类条件分布之间的IPM等于最优分类风险的相反数。研究证明IPMs的估计具有一致性,且收敛速度优于φ-散度,并指出总变异度量是唯一同时属于φ-散度的IPM,凸显了二者在统计学习应用中的根本性差异。

ABSTRACT

A class of distance measures on probabilities -- the integral probability metrics (IPMs) -- is addressed: these include the Wasserstein distance, Dudley metric, and Maximum Mean Discrepancy. IPMs have thus far mostly been used in more abstract settings, for instance as theoretical tools in mass transportation problems, and in metrizing the weak topology on the set of all Borel probability measures defined on a metric space. Practical applications of IPMs are less common, with some exceptions in the kernel machines literature. The present work contributes a number of novel properties of IPMs, which should contribute to making IPMs more widely used in practice, for instance in areas where $ϕ$-divergences are currently popular. First, to understand the relation between IPMs and $ϕ$-divergences, the necessary and sufficient conditions under which these classes intersect are derived: the total variation distance is shown to be the only non-trivial $ϕ$-divergence that is also an IPM. This shows that IPMs are essentially different from $ϕ$-divergences. Second, empirical estimates of several IPMs from finite i.i.d. samples are obtained, and their consistency and convergence rates are analyzed. These estimators are shown to be easily computable, with better rates of convergence than estimators of $ϕ$-divergences. Third, a novel interpretation is provided for IPMs by relating them to binary classification, where it is shown that the IPM between class-conditional distributions is the negative of the optimal risk associated with a binary classifier. In addition, the smoothness of an appropriate binary classifier is proved to be inversely related to the distance between the class-conditional distributions, measured in terms of an IPM.

研究动机与目标

  • 阐明积分概率度量(IPMs)与φ-散度之间的理论关系,特别是它们的交集与根本性差异。
  • 从独立同分布的有限样本中,开发IPMs的一致且计算高效的估计器,并给出明确的收敛速率。
  • 通过二分类提供IPMs的新解释,将类条件分布之间的距离与最优分类风险联系起来。
  • 建立最优二分类器的平滑性与类条件分布之间IPM之间的反比关系。

提出的方法

  • 推导IPMs与φ-散度相交的必要与充分条件,证明仅总变异距离属于这两类。
  • 提出使用有界可测函数类F对IPMs进行经验估计,利用Rademacher复杂度和McDiarmid不等式获得集中界。
  • 应用对称化与经验过程理论,界定经验IPM与其真实值之间的偏差,确保一致性。
  • 通过证明IPM等于在Lipschitz约束下最优风险的相反数,建立IPMs与二分类风险之间的对偶性。
  • 利用Lipschitz延拓定理与凸分析(例如定理24)证明最优分类器的结构特性及其平滑性与IPM距离的关系。
  • 使用覆盖数与熵条件分析IPM估计器的收敛速率,表明在相同条件下,其收敛速度优于φ-散度的估计器。

实验结果

研究问题

  • RQ1是否存在某些φ-散度,同时也是积分概率度量(IPMs)?
  • RQ2如何从独立同分布的有限样本中一致估计IPMs?其收敛速率与φ-散度相比如何?
  • RQ3IPMs与二分类中的最优风险之间存在何种关系?
  • RQ4最优二分类器的平滑性如何与类条件分布之间的IPM相关联?
  • RQ5由于IPMs在计算与理论方面相对于φ-散度具有优势,能否在统计学习中实际应用?

主要发现

  • 总变异距离是唯一非平凡的φ-散度,同时也是积分概率度量,确立了这两类度量之间的根本性差异。
  • IPMs的经验估计器具有一致性,且在高维设置下收敛速度优于φ-散度的估计器。
  • 类条件分布之间的IPM等于具有有界Lipschitz约束的二分类器的最优分类风险的相反数。
  • 最优二分类器的平滑性与类条件分布之间的IPM成反比,为距离提供了几何解释。
  • IPMs可利用再生核希尔伯特空间(RKHS)中的函数类等高效估计,收敛速率通过Rademacher复杂度与McDiarmid不等式推导得出。
  • 通过使用对称化与集中不等式,推导出IPM估计误差的理论界,确保在有限样本设置下的可靠性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。