QUICK REVIEW

[論文レビュー] Sub-Sampled Newton Methods II: Local Convergence Rates

Farbod Roosta-Khorasani, Michael W. Mahoney|arXiv (Cornell University)|Jan 18, 2016

Sparse and Compressive Sensing Techniques参考文献 53被引用数 60

ひとこと要約

この論文は大規模最適化のための部分サンプル化ニュートン法を分析し、計算コストを削減しながら局所収束性を保つためにヘシアンと/または勾配を部分的にサンプル化する変種を提案する。ランダム行列集中と近似行列乗算を活用することで、問題固有の条件数に依存しない局所的Q線形およびQ超線形収束レートを確立する。

ABSTRACT

Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex constraint set $\mathcal{X} \subseteq \mathbb{R}^{p}$ where both $n$ and $p$ are large. In such problems, sub-sampling as a way to reduce $n$ can offer great amount of computational efficiency. Within the context of second order methods, we first give quantitative local convergence results for variants of Newton's method where the Hessian is uniformly sub-sampled. Using random matrix concentration inequalities, one can sub-sample in a way that the curvature information is preserved. Using such sub-sampling strategy, we establish locally Q-linear and Q-superlinear convergence rates. We also give additional convergence results for when the sub-sampled Hessian is regularized by modifying its spectrum or Levenberg-type regularization. Finally, in addition to Hessian sub-sampling, we consider sub-sampling the gradient as way to further reduce the computational complexity per iteration. We use approximate matrix multiplication results from randomized numerical linear algebra (RandNLA) to obtain the proper sampling strategy and we establish locally R-linear convergence rates. In such a setting, we also show that a very aggressive sample size increase results in a R-superlinearly convergent algorithm. While the sample size depends on the condition number of the problem, our convergence rates are problem-independent, i.e., they do not depend on the quantities related to the problem. Hence, our analysis here can be used to complement the results of our basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice.

研究の動機と目的

高次元パラメータと多数のデータポイントを有する大規模問題のための効率的な2次最適化手法を開発すること。
ヘシアンをランダム部分サンプル化することで近似する部分サンプル化ニュートン法の局所的収束行動を分析すること。
部分サンプル化ヘシアンに対する正則化の影響と収束レートへの影響を調査すること。
勾配とヘシアンの両方が部分サンプル化される完全に確率的なバージョンへの分析を拡張すること。
問題固有の条件数に依存しない収束保証を提供することで、ビッグデータ問題への広範な適用可能性を高めること。

提案手法

計算コストを削減しながら、ランダム行列集中不等式を用いて曲率情報を保持するため、ヘシアンの均一部分サンプル化を用いる。
ランダム化数値線形代数（RandNLA）からの近似行列乗算技術を応用し、ヘシアンおよび勾配の部分サンプル化の最適なサンプリング戦略を導出する。
Levenberg型（リッジ）正則化とスペクトル変更を導入し、初期段階の反復を安定化させる。理論的根拠により、これらは後段階では限定的であることが示される。
誤差再帰を確立し、最適解から遠い領域では2次的支配、最適解付近では線形的支配の複合的挙動を示す。
各反復で部分問題の正確な解を課すことにより、理論的収束保証を確保するが、これは計算上のボトル neck であると指摘されている。
ヘシアンと勾配の部分サンプル化のための独立および同時サンプリング戦略を分析し、サンプルサイズを段階的に増加させることでR超線形収束が達成されることを示す。

実験結果

リサーチクエスチョン

RQ1ヘシアンの部分サンプル化がニュートン法の局所的収束性を保つための条件は何か？
RQ2部分サンプル化ヘシアンの正則化が収束レートに与える影響は何か？また、いつ有益なのか？
RQ3ヘシアンと勾配を同時に部分サンプル化しても、局所的収束保証を維持できるか？
RQ4完全に確率的なニュートン法において、局所的R線形またはR超線形収束を保証するサンプリング戦略は何か？
RQ5収束レートは問題固有のパrameter（例：条件数）に依存するか？また、問題に依存しないようにできるか？

主な発見

完全な勾配と均一に部分サンプル化されたヘシアンを用いた部分サンプル化ニュートン法は、局所的Q線形収束を達成する。誤差再帰は最適解に近づくにつれて2次的支配から線形的支配へと移行する。
ヘシアンの部分サンプルサイズを段階的に増加させることで、局所的Q超線形収束が達成され、漸近的挙動が向上することが示された。
部分サンプル化ヘシアンに対する正則化（スペクトル変更またはLevenberg型）は初期段階の収束を改善するが、最適解付近では正則化なしの部分サンプル化がより効果的である。
ヘシアンと勾配の両方が部分サンプル化される場合、局所的R線形収束が達成され、より積極的なサンプルサイズの増加によりR超線形収束が実現できる。
すべての収束レートは問題に依存せず、条件数やその他の問題固有の量に依存しない。これにより一般化可能性が向上する。
分析により、実際のアルゴリズム的トレードオフの理論的基盤が提供され、収束保証を損なわずに計算コストと収束速度のバランスを取ることが可能になる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。