QUICK REVIEW

[Paper Review] First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time

Yi Xu, Rong Jin|arXiv (Cornell University)|Nov 3, 2017

Sparse and Compressive Sensing Techniques58 citations

TL;DR

The paper introduces NEON, a first-order stochastic procedure to extract negative curvature from the Hessian, enabling almost linear-time escape from saddle points and finding nearly second-order stationary points with high probability.

ABSTRACT

Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information. The existing analysis for algorithms using noise in the first-order information is quite involved and hides the essence of added noise, which hinder further improvements of these algorithms. In this paper, we present a novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix, and provide a formal reasoning of this perspective by analyzing a simple first-order procedure. More importantly, the proposed procedure enables one to design purely first-order stochastic algorithms for escaping from non-degenerate saddle points with a much better time complexity (almost linear time in terms of the problem's dimensionality). In particular, we develop a {\\bf first-order stochastic algorithm} based on our new technique and an existing algorithm that only converges to a first-order stationary point to enjoy a time complexity of {$\\widetilde O(d/\\epsilon^{3.5})$ for finding a nearly second-order stationary point $\\bf{x}$ such that $\\|\ abla F(bf{x})\\|\\leq \\epsilon$ and $\ abla^2 F(bf{x})\\geq -\\sqrt{\\epsilon}I$ (in high probability), where $F(\\cdot)$ denotes the objective function and $d$ is the dimensionality of the problem. To the best of our knowledge, this is the best theoretical result of first-order algorithms for stochastic non-convex optimization, which is even competitive with if not better than existing stochastic algorithms hinging on the second-order information.

Motivation & Objective

Motivate and address stochastic non-convex optimization problems.
Develop first-order procedures to escape non-degenerate saddle points via negative curvature origin from noise (NEON).
Provide a framework with second-order convergence guarantees using first-order information.
Achieve near-linear time complexity in problem dimension for finding nearly second-order stationary points.

Proposed method

Introduce NEON: a procedure to extract negative curvature from the Hessian starting from noise.
Integrate NEON into a general first-order stochastic algorithmic framework.
Prove second-order convergence guarantees for finding nearly second-order stationary points.
Derive time complexity results and show the almost linear dependence on problem dimension.
Relate the framework to finite-sum settings with many components.

Experimental results

Research questions

RQ1Can first-order stochastic methods escape from saddle points efficiently by leveraging negative curvature naturally arising from noise?
RQ2What is the time complexity to find a nearly second-order stationary point using first-order information in stochastic non-convex optimization?
RQ3How can NEON be integrated into general SGD-type algorithms to guarantee second-order convergence with high probability?
RQ4How close to linear in dimension is achievable for the overall algorithm’s runtime?
RQ5Do the proposed methods apply to both expectation-form problems and large finite-sum problems?

Key findings

Proposes NEON to extract negative curvature from the Hessian using a noise-based sequence.
Develops a framework achieving second-order convergence guarantees with pure first-order stochastic methods.
Shows the best time complexity is ~O(d/ε^{3.5}) to find a point with ∥∇F(x)∥ ≤ ε and ∇^2F(x) ≥ −√ε I with high probability.
Demonstrates almost linear time in the problem dimension for escaping saddle points.
First-order stochastic algorithms achieve nearly second-order stationary points competitive with methods using second-order information.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.