QUICK REVIEW

[論文レビュー] Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Nicolas Loizou, Sharan Vaswani|arXiv (Cornell University)|Feb 24, 2020

Stochastic Gradient Optimization Techniques参考文献 57被引用数 37

ひとこと要約

SPSを導入します。SGDの確率的ポリアック・ステップサイズを用いて f_i^* および f_i(x) を用いた学習率を適応させ、強凸・凸・非凸設定全般で高速収束を達成し、補完射影領域で強い結果を示します。

ABSTRACT

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

研究の動機と目的

SGDの有限和学習問題におけるステップサイズ選択の動機づけと解決
SPS（確率的ポリアック・ステップサイズ）をSGDの適応的学習率として導入
SPSに対する理論的収束保証を、強凸性・凸性・非凸性の各設定で提供
補間設定でSPSが真の解へ効率的に収束できることを示す
合成データセットと実データセットの様々なモデルでSPSの実証的性能を示す

提案手法

SPSを gamma_k = (f_i(x^k) - f_i^*) / (c ||∇f_i(x^k)||^2) および有界版 SPS_max として定義する
SPSを古典的な決定論的ポリアック・ステップサイズと関連付け、f_i^* の知識と c の選択が要求されることを論じる
強凸・凸・非凸（PL条件）および一定ステップサイズ領域の下でのSPS_maxに対する理論的収束結果を提供する
SPSが高次元補間・過parameterization設定で真の解への高速収束を達成することを分析する
非スムーズおよびストリーミング設定への拡張を提示し、SPSを線形方程式ソルバと結びつける

実験結果

リサーチクエスチョン

RQ1確率的ポリアック・ステップサイズ（SPS）は、強凸・凸・非凸目的関数に対してSGDの収束を保証しますか？
RQ2補間（過パラメータ化）領域において、SPSは定数ステップサイズSGDや他の適応法とどう比較されますか？
RQ3さまざまな滑らかさ・凸性の仮定の下で、SPSおよびSPS_maxの収束速度と近傍の大きさはどの程度ですか？
RQ4問題依存定数を知らずに、補間設定でSPSが正確な解へ収束できるのでしょうか？
RQ5合成データや過パラメータ化モデルに対するSPSの経験的性能は、最新の最適化手法と比較してどうですか？

主な発見

SPSは、強凸・凸・滑らかな非凸設定における SGD の収束保証を提供します。
SPS_max は上限 γ_b と最適目的差 σ^2 に依存する近傍へ線形収束します。
補間領域では、SPSは問題依存定数や追加オーバーヘッドなしに真の解へ高速収束を可能にします。
一定ステップサイズ領域では、SPSは特定の境界の下で従来の定数ステップ SGDよりも一致または改善する収束挙動を示します。
実験結果は、合成データ、深層行列因子分解、カーネルベースの二値分類、深層ネットワークなど、過parameterizedモデルに対してSPSが複数の最適化手法を上回ることを示しています。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。