QUICK REVIEW

[论文解读] SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Cong Fang, Chris Junchi Li|arXiv (Cornell University)|Jul 4, 2018

Stochastic Gradient Optimization Techniques参考文献 38被引用 74

一句话总结

SPIDER 引入了一种随机路径积分微分估计器，用以跟踪确定性量，显著降低采样成本，在一阶和二阶设置的非凸随机优化中实现近似最优的收敛速率，并且还包含一个零阶变体。

ABSTRACT

In this paper, we propose a new technique named extit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO extsuperscript{+}, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO extsuperscript{+} algorithms achieve a record-breaking gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} ε^{-2}, ε^{-3} ) ight)$ for finding an $ε$-approximate first-order and $ ilde{\mathcal{O}}\left( \min( n^{1/2} ε^{-2}+ε^{-2.5}, ε^{-3} ) ight)$ for finding an $(ε, \mathcal{O}(ε^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of $\mathcal{O}( d \min( n^{1/2} ε^{-2}, ε^{-3}) )$ which outperforms all existing results.

研究动机与目标

仅使用随机梯度来高效地激励并解决非凸随机优化问题。
开发一种新估计器 SPIDER，用于跟踪确定性量，降低采样成本。
实现更快的收敛速率，以找到近似的一阶和二阶驻点。
将 SPIDER 扩展到零阶优化，并展示改进的函数评估成本。

提出的方法

提出随机路径积分微分估计器 (SPIDER)，用于以更低的采样成本跟踪诸如梯度之类的量。
将 SPIDER 与归一化梯度下降 (NGD) 相结合，形成 SPIDER-SFO 和 SPIDER-SFO+，用于非凸优化。
推导误差界，表明基于 SPIDER 的估计量保持受控的方差和偏差（基于鞅分析）。
将 SPIDER 应用于随机零阶方法，并获得更低的目标函数值访问成本。
给出在有限和在线设定中，找到 ε-近似一阶点和 (ε, ε^0.5)-近似二阶点的收敛定理。

实验结果

研究问题

RQ1SPIDER 是否能够降低在非凸随机优化中找到 ε-近似一阶驻点所需的梯度采样复杂度？
RQ2在标准光滑性假设下，SPIDER 是否能够实现接近最优的二阶驻点发现速率？
RQ3将 SPIDER 应用于零阶非凸优化的收益和代价是什么？
RQ4在梯度复杂度和鲁棒性方面，SPIDER 与现有的方差约简和逃离鞘点的方法相比如何？

主要发现

SPIDER-SFO 实现寻找 ε-近似一阶驻点的梯度计算成本为 O(min(n^1/2 ε^-2, ε^-3))。
SPIDER-SFO+（含负曲率搜索）在 Hessian-Lipschitz 条件下，对于 (ε, O(ε^0.5))-近似二阶驻点，梯度成本为 Õ(min(n^1/2 ε^-2 + ε^-2.5, ε^-3))。
在在线/有限和设定中，SPIDER 在找到近似一阶驻点方面几乎达到算法下界，仅多出多对数因子和常数。
SPIDER 在零阶优化中的成本达到 O(d min(n^1/2 ε^-2, ε^-3)) 次函数评估，优于现有结果。
该分析提供了一个更简单的收敛框架，可以扩展到其他算法，如 SGD、SVRG 和 SAGA。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。