QUICK REVIEW

[论文解读] Escaping Saddle Points in Constrained Optimization

Aryan Mokhtari, Asuman Ozdaglar|arXiv (Cornell University)|Sep 6, 2018

Sparse and Compressive Sensing Techniques被引用 30

一句话总结

本文提出了一种通用的优化框架，通过结合一阶和二阶信息，在约束非凸问题中实现对鞍点的逃逸，当可行集允许高效近似求解二次规划时，该方法在 $\mathcal{O}(\max\{\epsilon^{-2}, \rho^{-3}\gamma^{-3}\})$ 次迭代内收敛至 $(\epsilon,\gamma)$-二阶平稳点。在严格鞍点条件下，该方法可确保收敛至局部极小值。

ABSTRACT

In this paper, we study the problem of escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $ρ$-approximate solution of a quadratic program subject to $\mathcal{C}$ in polynomial time, where $ρ<1$ is a positive constant that depends on the structure of the set $\mathcal{C}$. Under this condition, we show that the sequence of iterates generated by the proposed framework reaches an $(ε,γ)$-second order stationary point (SOSP) in at most $\mathcal{O}(\max\{ε^{-2},ρ^{-3}γ^{-3}\})$ iterations. We further characterize the overall complexity of reaching an SOSP when the convex set $\mathcal{C}$ can be written as a set of quadratic constraints and the objective function Hessian has a specific structure over the convex set $\mathcal{C}$. Finally, we extend our results to the stochastic setting and characterize the number of stochastic gradient and Hessian evaluations to reach an $(ε,γ)$-SOSP.

研究动机与目标

解决约束非凸优化中逃逸鞍点的挑战，其中一阶平稳点可能不对应于局部极小值。
开发一种通用的算法框架，利用一阶和二阶信息收敛至二阶平稳点（SOSPs）。
在约束集 $\mathcal{C}$ 和目标函数 Hessian 矩阵具有特定结构假设的条件下，刻画达到 $(\epsilon,\gamma)$-SOSP 的迭代复杂度和算术复杂度。
将框架扩展至随机设置，分析达到收敛所需的随机梯度和 Hessian 估计次数。

提出的方法

该框架分为两个阶段：首先使用一阶方法到达一阶平稳点；其次应用二阶信息逃逸严格鞍点或局部极大值。
其依赖于在多项式时间内计算凸集 $\mathcal{C}$ 上的 $\rho$-近似二次规划解的能力，其中 $\rho < 1$ 是依赖于 $\mathcal{C}$ 结构的常数。
算法使用随机方向 $\mathbf{d}_t$ 检测可行集中的曲率，以高概率检测到负曲率。
采用具有有界方差的随机梯度和 Hessian，通过控制批大小来调节曲率估计误差的概率。
该方法确保：若某点不是 $(\epsilon,\gamma)$-SOSP，则通过精心构造的下降方向可实现目标函数值的充分下降。
对于二次约束，该方法的算术操作次数为 $\mathcal{O}(\max\{\tau\epsilon^{-2}, d^3 m^7 \gamma^{-3}\})$，其中 $\tau$ 为求解线性规划或投影到 $\mathcal{C}$ 的代价。

实验结果

研究问题

RQ1在约束集 $\mathcal{C}$ 满足何种条件时，可利用一阶与二阶信息的组合高效逃逸约束非凸优化中的鞍点？
RQ2当可行集允许 $\rho$-近似求解二次规划时，达到 $(\epsilon,\gamma)$-二阶平稳点的迭代复杂度是多少？
RQ3该算法的复杂度如何随维度 $d$、二次约束数量 $m$ 以及精度参数 $\epsilon$ 和 $\gamma$ 变化？
RQ4在随机设置下，达到 $(\epsilon,\gamma)$-SOSP 所需的随机梯度和 Hessian 估计次数是多少？
RQ5当使用噪声梯度和 Hessian 估计时，该框架能否保证以高概率收敛至 SOSP？

主要发现

当在多项式时间内可计算 $\mathcal{C}$ 上二次规划的 $\rho$-近似解时，所提框架在最多 $\mathcal{O}(\max\{\epsilon^{-2}, \rho^{-3}\gamma^{-3}\})$ 次迭代内收敛至 $(\epsilon,\gamma)$-二阶平稳点。
对于由二次约束定义的凸集且在特定 Hessian 结构下，总算术复杂度被限制在 $\mathcal{O}(\max\{\tau\epsilon^{-2}, d^3 m^7 \gamma^{-3}\})$ 内，其中 $\tau$ 为求解线性规划或投影到 $\mathcal{C}$ 的代价。
在随机设置下，算法需要 $\mathcal{O}(\max\{\epsilon^{-4}, \epsilon^{-2}\rho^{-4}\gamma^{-4}, \rho^{-7}\gamma^{-7}\})$ 次随机梯度估计和 $\mathcal{O}(\max\{\epsilon^{-2}\rho^{-3}\gamma^{-3}, \rho^{-5}\gamma^{-5}\})$ 次随机 Hessian 估计，以达到 $(\epsilon,\gamma)$-SOSP。
通过为随机梯度和 Hessian 选择合适的批大小，算法输出 $(\epsilon,\gamma)$-SOSP 的概率至少为 0.92。
该框架确保：若某点不是 SOSP，则通过利用负曲率的高概率方向可实现目标函数值的充分下降。
分析表明，以高概率，Hessian 近似误差和梯度估计误差均被有界，从而实现对可行集中负曲率的可靠检测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。