QUICK REVIEW

[论文解读] Online and Stochastic Gradient Methods for Non-decomposable Loss Functions

Purushottam Kar, Harikrishna Narasimhan|arXiv (Cornell University)|Oct 24, 2014

Stochastic Gradient Optimization Techniques参考文献 8被引用 29

一句话总结

本文提出了一种新颖的在线和随机梯度框架，用于处理非可分解损失函数（如Precision@k和pAUC），这些损失函数在类别不平衡学习中至关重要。通过Ranked List的结构性引理，建立了次线性遗憾边界，并开发了可扩展的求解器，其收敛性可被严格证明，达到经验风险最小化，其速度比切割平面方法快几个数量级。

ABSTRACT

Modern applications in sensitive domains such as biometrics and medicine frequently require the use of non-decomposable loss functions such as precision@k, F-measure etc. Compared to point loss functions such as hinge-loss, these offer much more fine grained control over prediction, but at the same time present novel challenges in terms of algorithm design and analysis. In this work we initiate a study of online learning techniques for such non-decomposable loss functions with an aim to enable incremental learning as well as design scalable solvers for batch problems. To this end, we propose an online learning framework for such loss functions. Our model enjoys several nice properties, chief amongst them being the existence of efficient online learning algorithms with sublinear regret and online to batch conversion bounds. Our model is a provable extension of existing online learning models for point loss functions. We instantiate two popular losses, prec@k and pAUC, in our model and prove sublinear regret bounds for both of them. Our proofs require a novel structural lemma over ranked lists which may be of independent interest. We then develop scalable stochastic gradient descent solvers for non-decomposable loss functions. We show that for a large family of loss functions satisfying a certain uniform convergence property (that includes prec@k, pAUC, and F-measure), our methods provably converge to the empirical risk minimizer. Such uniform convergence results were not known for these losses and we establish these using novel proof techniques. We then use extensive experimentation on real life and benchmark datasets to establish that our method can be orders of magnitude faster than a recently proposed cutting plane method.

研究动机与目标

解决现有在线与随机优化方法在非可分解损失函数（如F-measure、Precision@k和pAUC）上缺乏理论基础的问题。
设计一种在线学习框架，可泛化现有针对可分解损失的模型，同时支持增量学习与在线到批量的转换。
为广泛类别的非可分解损失（包括具有一致收敛性质的损失）开发可证明收敛的随机梯度求解器。
通过一种新的Ranked List结构性引理，建立新颖的理论保证——具体为次线性遗憾与收敛至经验风险最小化。

提出的方法

通过基于稳定性的方法定义即时惩罚，提出一种针对非可分解损失的有理论依据的在线学习框架，确保在应用于可分解损失时与现有在线模型兼容。
在该框架内引入FTRL算法，证明在通用稳定性条件下达到${\cal O}(1/\sqrt{T})$的遗憾。
将该框架应用于Precision@k和pAUC的凸代理损失，利用关于Ranked List度量Lipschitz连续性的新颖结构性引理，证明次线性遗憾。
通过利用结构性引理，开发了针对非可分解损失的随机梯度下降求解器，其基于一致收敛风格的结果。
通过过滤排名靠前的负样本并计算正样本与过滤后负样本之间的次梯度，实现了高效的1PMB和2PMB计算流程，实现每轮${\cal O}(s\log s)$的复杂度。
采用pAUC的代理损失公式：$\ell_{\text{pAUC}}({\mathbf{w}}) = \sum_{i:y_i>0} \ell^{+}_{S_-}(x_i, {\mathbf{w}})$，其中$\ell^{+}_{S_-}$对top-$\beta$比例的负样本的铰链损失进行聚合。

实验结果

研究问题

RQ1能否设计一种有理论依据的在线学习框架，用于非可分解损失函数，使其能泛化现有针对可分解损失的模型？
RQ2在稳定性条件下，非可分解损失（如Precision@k和pAUC）的在线学习是否能保持次线性遗憾边界？
RQ3能否证明随机梯度方法可收敛至非可分解损失（如pAUC和F-measure）的经验风险最小化？
RQ4哪些关于Ranked List的新结构性特性，使得非可分解损失的一致收敛与遗憾分析成为可能？

主要发现

所提出的在线框架在稳定性条件下，对Precision@k和pAUC实现了${\cal O}(1/\sqrt{T})$的遗憾，首次为这些非可分解损失建立了可证明的次线性遗憾边界。
关于内积排序列表的新结构性引理，证明了Ranked List度量的Lipschitz连续性，从而支持了遗憾与收敛性分析。
针对pAUC、Precision@k和F-measure的随机梯度求解器可证明收敛至经验风险最小化，其收敛性基于新的统一收敛风格结果。
在KDD 2008数据集上，该方法在30ms内达到64.8%的pAUC，显著优于需要超过1.2秒才能达到类似性能的切割平面方法。
该方法在真实世界与基准数据集上，相比最先进切割平面技术，速度提升数个数量级，同时保持或提升了准确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。