QUICK REVIEW

[論文レビュー] SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Cong Fang, Chris Junchi Li|arXiv (Cornell University)|Jul 4, 2018

Stochastic Gradient Optimization Techniques参考文献 38被引用数 74

ひとこと要約

SPIDERは、決定量を追跡するための確率的経路積分微分推定器を導入し、サンプリングコストを大幅に削減し、1階および2階設定で非凸確率最適化にほぼ最適なレートをもたらすとともに、zeroth-order バリアントを提供します。

ABSTRACT

In this paper, we propose a new technique named extit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO extsuperscript{+}, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO extsuperscript{+} algorithms achieve a record-breaking gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} ε^{-2}, ε^{-3} ) ight)$ for finding an $ε$-approximate first-order and $ ilde{\mathcal{O}}\left( \min( n^{1/2} ε^{-2}+ε^{-2.5}, ε^{-3} ) ight)$ for finding an $(ε, \mathcal{O}(ε^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of $\mathcal{O}( d \min( n^{1/2} ε^{-2}, ε^{-3}) )$ which outperforms all existing results.

研究の動機と目的

確率的勾配のみを用いて、非凸確率的最適化を効率的に動機づけ、対処する。
サンプリングコストを低減して決定量を追跡する新しい推定器である SPIDER を開発する。
近似的な第一-および第二-階の定常点を見つける収束速度を高速化する。
SPIDER を zeroth-order 最適化に拡張し、関数評価コストの改善を示す。

提案手法

勾配などの量を低いサンプリングコストで追跡する Stochastic Path-Integrated Differential Estimator (SPIDER) を提案する。
SPIDER を Normalized Gradient Descent (NGD) と組み合わせて、非凸最適化のための SPIDER-SFO および SPIDER-SFO+ を作成する。
SPIDER に基づく推定量が分散とバイアスを制御下に保つことを示す誤差境界を導出する（マルチンゲールベースの解析）。
SPIDER を確率的 zeroth-order 法に適用し、関数値アクセスコストを低減する。
有限和およびオンライン設定を含む、ε-近似第一階点および (ε, ε^0.5)-近似第二階点を見つける収束定理を提供する。

実験結果

リサーチクエスチョン

RQ1SPIDER は、非凸確率的最適化において ε-近似第一階点を見つけるために必要な勾配サンプリングの複雑さを低減できるか？
RQ2標準的な滑らかさ仮定の下で、ε-近似第二階点を見つけるためのほぼ最適なレートを SPIDER は達成できるか？
RQ3SPIDER を zeroth-order 非凸最適化に適用することの利点とコストは何か？
RQ4勾配の複雑さとロバスト性の観点で、SPIDER は既存の分散低減法やサドル点脱出法とどう比較されるか？

主な発見

SPIDER-SFO は ε-近似第一階定常点を見つけるための勾配計算コストを O(min(n^1/2 ε^-2, ε^-3)) と達成する。
SPIDER-SFO+ (Negative-Curvature-Search を伴う) は、Hessian-Lipschitz性の下で (ε, O(ε^0.5))-近似第二階定常点に対して勾配コストは Õ(min(n^1/2 ε^-2 + ε^-2.5, ε^-3)) を達成する。
オンライン/有限和設定において、SPIDER は第一階点の近似を見つけるためのアルゴリズム的下界にほぼ一致し、多項対数因子と定数を除けば一致する。
zeroth-order 最適化の SPIDER は O(d min(n^1/2 ε^-2, ε^-3)) の関数評価コストを達成し、既存の結果を改善する。
この解析は、SGD、SVRG、および SAGA など他のアルゴリズムへ拡張できるより単純な収束フレームワークを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。