Skip to main content
QUICK REVIEW

[Paper Review] First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time

Yi Xu, Rong Jin|arXiv (Cornell University)|Nov 3, 2017
Sparse and Compressive Sensing Techniques58 citations
TL;DR

The paper introduces NEON, a first-order stochastic procedure to extract negative curvature from the Hessian, enabling almost linear-time escape from saddle points and finding nearly second-order stationary points with high probability.

ABSTRACT

Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information. The existing analysis for algorithms using noise in the first-order information is quite involved and hides the essence of added noise, which hinder further improvements of these algorithms. In this paper, we present a novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix, and provide a formal reasoning of this perspective by analyzing a simple first-order procedure. More importantly, the proposed procedure enables one to design purely first-order stochastic algorithms for escaping from non-degenerate saddle points with a much better time complexity (almost linear time in terms of the problem's dimensionality). In particular, we develop a {\\bf first-order stochastic algorithm} based on our new technique and an existing algorithm that only converges to a first-order stationary point to enjoy a time complexity of {$\\widetilde O(d/\\epsilon^{3.5})$ for finding a nearly second-order stationary point $\\bf{x}$ such that $\\|\ abla F(bf{x})\\|\\leq \\epsilon$ and $\ abla^2 F(bf{x})\\geq -\\sqrt{\\epsilon}I$ (in high probability), where $F(\\cdot)$ denotes the objective function and $d$ is the dimensionality of the problem. To the best of our knowledge, this is the best theoretical result of first-order algorithms for stochastic non-convex optimization, which is even competitive with if not better than existing stochastic algorithms hinging on the second-order information.

Motivation & Objective

  • Motivate and address stochastic non-convex optimization problems.
  • Develop first-order procedures to escape non-degenerate saddle points via negative curvature origin from noise (NEON).
  • Provide a framework with second-order convergence guarantees using first-order information.
  • Achieve near-linear time complexity in problem dimension for finding nearly second-order stationary points.

Proposed method

  • Introduce NEON: a procedure to extract negative curvature from the Hessian starting from noise.
  • Integrate NEON into a general first-order stochastic algorithmic framework.
  • Prove second-order convergence guarantees for finding nearly second-order stationary points.
  • Derive time complexity results and show the almost linear dependence on problem dimension.
  • Relate the framework to finite-sum settings with many components.

Experimental results

Research questions

  • RQ1Can first-order stochastic methods escape from saddle points efficiently by leveraging negative curvature naturally arising from noise?
  • RQ2What is the time complexity to find a nearly second-order stationary point using first-order information in stochastic non-convex optimization?
  • RQ3How can NEON be integrated into general SGD-type algorithms to guarantee second-order convergence with high probability?
  • RQ4How close to linear in dimension is achievable for the overall algorithm’s runtime?
  • RQ5Do the proposed methods apply to both expectation-form problems and large finite-sum problems?

Key findings

  • Proposes NEON to extract negative curvature from the Hessian using a noise-based sequence.
  • Develops a framework achieving second-order convergence guarantees with pure first-order stochastic methods.
  • Shows the best time complexity is ~O(d/ε^{3.5}) to find a point with ∥∇F(x)∥ ≤ ε and ∇^2F(x) ≥ −√ε I with high probability.
  • Demonstrates almost linear time in the problem dimension for escaping saddle points.
  • First-order stochastic algorithms achieve nearly second-order stationary points competitive with methods using second-order information.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.