QUICK REVIEW

[论文解读] Second-Order Kernel Online Convex Optimization with Adaptive Sketching

Daniele Calandriello, Alessandro Lazaric|arXiv (Cornell University)|Jun 15, 2017

Stochastic Gradient Optimization Techniques被引用 23

一句话总结

本文提出核在线牛顿步（KONS），一种二阶核在线凸优化方法，实现 $Ó(d_{ ext{eff}} ext{log} T)$ 的遗憾——关于 $T$ 对数增长——同时通过自适应矩阵压缩降低计算成本。所提出的压缩-KONS方法将时间和空间复杂度降低 $\gamma^2$ 倍，遗憾仅增加 $1/\gamma$ 倍，从而在核空间中实现高效且低遗憾的在线学习。

ABSTRACT

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve $\mathcal{O}(\log( ext{Det}(\boldsymbol{K})))$ regret, which we show scales as $\mathcal{O}(d_{ ext{eff}}\log T)$, where $d_{ ext{eff}}$ is the effective dimension of the problem and is usually much smaller than $\mathcal{O}(\sqrt{T})$. The main drawback of second-order methods is their much higher $\mathcal{O}(t^2)$ space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves $\mathcal{O}(d_{ ext{eff}}\log T)$ regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix $\boldsymbol{K}_t$, and show that for a chosen parameter $γ\leq 1$ our Sketched-KONS reduces the space and time complexity by a factor of $γ^2$ to $\mathcal{O}(t^2γ^2)$ space and time per iteration, while incurring only $1/γ$ times more regret.

研究动机与目标

为解决二阶核在线凸优化（KOCO）方法的高计算成本问题，其时间与空间复杂度在每次迭代中均呈 $\mathcal{O}(t^2)$ 增长。
通过利用损失函数的二阶曲率信息，实现KOCO中遗憾的对数缩放（$\mathcal{O}(d_{\text{eff}}\log T)$），而该信息在一类方法中被低估。
开发一种基于压缩的方法，降低二阶KOCO的复杂度，同时不牺牲遗憾性能，尤其适用于有效维数较低的问题。
克服现有基于字典的压缩方法的局限性，这些方法由于适应性差与预算约束，在在线设置中无法实现对数遗憾。

提出的方法

提出核在线牛顿步（KONS），一种二阶KOCO算法，利用损失函数的Hessian矩阵自适应更新模型，实现 $\mathcal{O}(d_{\text{eff}}\log T)$ 的遗憾。
提出一种新颖的自适应矩阵压缩算法，用于核矩阵 $\mathbf{K}_t$，将时间和空间复杂度降低 $\gamma^2$ 倍，其中参数 $\gamma \leq 1$。
在KONS中对Hessian近似应用压缩，保持遗憾在原始二阶方法的 $1/\gamma$ 倍以内。
采用一种动态维护核矩阵低秩近似的压缩策略，实现高效更新与存储。
将遗憾分解为 $R_G$（基于梯度）与 $R_D$（与最优解的差异），表明自适应压缩能有效控制两项。
证明基于字典的压缩方法在在线设置中失效，原因在于目标冲突：最小化遗憾、控制存储空间、避免权重衰减。

实验结果

研究问题

RQ1二阶KOCO方法是否能在保持计算效率的同时，在核空间中实现遗憾的对数缩放？
RQ2如何将矩阵压缩适配至二阶KOCO，以降低时间和空间复杂度，同时不显著增加遗憾？
RQ3为何现有基于字典的压缩方法在在线核学习中无法实现对数遗憾，尽管在批量设置中表现成功？
RQ4能否设计出自适应压缩策略，在在线设置中实现低遗憾并支持动态模型更新？

主要发现

所提出的压缩-KONS方法实现 $\mathcal{O}(d_{\text{eff}}\log T)$ 的遗憾，与完整KONS的最优二阶遗憾界一致，其中 $d_{\text{eff}}$ 为问题的有效维数。
通过应用参数 $\gamma$ 的自适应压缩，方法将每次迭代的时间与空间复杂度从 $\mathcal{O}(t^2)$ 降低至 $\mathcal{O}(t^2\gamma^2)$。
压缩-KONS的遗憾最多比完整KONS方法增加 $1/\gamma$ 倍，从而实现复杂度与遗憾之间的可调权衡。
反例表明，基于字典的压缩方法在在线设置中无法实现对数遗憾，原因在于权重调度与预算约束之间的目标冲突。
分析表明，二阶方法比一阶方法更有效地利用曲率，当损失函数强凸时，可将遗憾从 $\mathcal{O}(\sqrt{T})$ 降低至 $\mathcal{O}(d_{\text{eff}}\log T)$。
该方法表明，与固定字典方法相比，自适应压缩能在线性核学习中实现更好的适应性与性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。