QUICK REVIEW

[論文レビュー] Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Farbod Roosta-Khorasani, Michael W. Mahoney|arXiv (Cornell University)|Jan 18, 2016

Sparse and Compressive Sensing Techniques参考文献 41被引用数 60

ひとこと要約

本稿では、ヘッセ行列を一様なサブサンプリングで近似するとともに勾配を完全に使用することで、任意の初期点から収束するグローバル収束性を有する大規模最適化のためのサブサンプルニュートン法を提案する。収束境界は非漸近的かつ定量的であり、条件数に依存する。また、線形方程式系の不正確な解に対してもグローバル収束性を保証し、収束を高速化するための精度許容誤差は $\mathcal{O}(1/\sqrt{\tilde{\kappa}})$ のオーダーである。

ABSTRACT

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms and we provide bounds on the convergence of the variants of Newton's method that incorporate uniform sub-sampling as a means to estimate the gradient and/or Hessian. Our bounds are non-asymptotic and quantitative. Our algorithms are global and are guaranteed to converge from any initial iterate. Using random matrix concentration inequalities, one can sub-sample the Hessian to preserve the curvature information. Our first algorithm incorporates Hessian sub-sampling while using the full gradient. We also give additional convergence results for when the sub-sampled Hessian is regularized by modifying its spectrum or ridge-type regularization. Next, in addition to Hessian sub-sampling, we also consider sub-sampling the gradient as a way to further reduce the computational complexity per iteration. We use approximate matrix multiplication results from randomized numerical linear algebra to obtain the proper sampling strategy. In all these algorithms, computing the update boils down to solving a large scale linear system, which can be computationally expensive. As a remedy, for all of our algorithms, we also give global convergence results for the case of inexact updates where such linear system is solved only approximately. This paper has a more advanced companion paper, [42], in which we demonstrate that, by doing a finer-grained analysis, we can get problem-independent bounds for local convergence of these algorithms and explore trade-offs to improve upon the basic results of the present paper.

研究の動機と目的

完全なヘッセ行列の計算が不可能な大規模問題に対して、グローバルに収束する2次最適化アルゴリズムを開発すること。
ヘッセ行列の均一サブサンプリングと完全勾配を用いたサブサンプルニュートン法の非漸近的収束保証を提供すること。
ニュートン方程式の不正確な解が収束に与える影響を分析し、明示的な許容誤差要件を提示すること。
ランダム化数値線形代数（RandNLA）に基づくサンプリング戦略を用いて、勾配とヘッセ行列の両方をサブサンプルする完全に確率的な変種への拡張。
補足論文（SSN2）と組み合わせることで、条件数に依存しない局所的収束性の基礎を築くこと。

提案手法

ヘッセ行列の均一サブサンプリングを用いながら勾長を完全に計算することで、確率的行列濃縮不等式を介して下降方向を保証する。
初期段階でサブサンプルヘッセ行列にスペクトル正則化またはリッジ型正則化を適用して条件数を改善し、収束に近づくと元のサブサンプルヘッセ行列に戻す。
ランダム化数値線形代数（RandNLA）における近似行列乗算の結果を応用し、勾配およびヘッセ行列の両方のサブサンプリングに最適なサンプリング戦略を導出する。
ニュートン方程式を近似的に解き、収束保証を $\mathcal{O}(1/\sqrt{\tilde{\kappa}})$ のオーダーの精度許容誤差のもとで提供する。ここで $\tilde{\kappa}$ はサンプリングの条件数である。
グローバル収束を保証するため、自然なステップサイズ $\alpha_k = 1$ を用いたArmijo則を採用し、特に最適解に近づく際の収束性を強化する。
正確な更新と不正確な更新の両方のスキームに対してグローバル収束を確立し、有限次元および有限反復において有効な理論的境界を提示する。

実験結果

リサーチクエスチョン

RQ1ニュートン法におけるサブサンプルヘッセ行列近似は、任意の初期反復からグローバル収束を保証できるか？
RQ2非漸近的境界を満たすために、ヘッセ行列と勾配のサンプルサイズはどの程度必要か？
RQ3ニュートン方程式の不正確な解が収束に与える影響は何か？そして、高速収束を保証するための精度許容誤差は何か？
RQ4サブサンプルヘッセ行列に正則化を適用することで、初期段階の収束性は向上するが、グローバル収束性に悪影響を及えないか？
RQ5サブサンプルニュートン法のグローバル収束特性と、特に条件数依存性を考慮した局所的収束速度の関係はいかなるものか？

主な発見

サブサンプルヘッセ行列と完全勾配を用いたアルゴリズムは、条件数に依存する非漸近的境界を有するグローバル線形収束を達成する。
ヘッセ行列をサブサンプルしながら勾長を完全に使用することで、サンプルサイズが条件数に対して十分に大きい場合には高確率で下降方向が保証される。
リッジ型またはスペクトル正則化を組み込むことで初期段階の収束性が向上するが、最適解に近づくと正則化を解除することで精度を維持する必要がある。
勾長とヘッセ行列の両方をサブサンプルする場合、RandNLAに基づくサンプリング戦略を用いることでグローバル収束が維持され、収束速度はサンプリングの品質に依存する。
不正確な更新の場合、解の精度が $\mathcal{O}(1/\sqrt{\tilde{\kappa}})$ のオーダー内であれば収束が保証される。ここで $\tilde{\kappa}$ はサンプリングの条件数である。
本稿のグローバル収束結果と補足論文 SSN2 [40] の局所的収束解析を組み合わせることで、条件数に依存しない局所的収束速度が得られ、完全ニュートン法の性能に近づく。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。