QUICK REVIEW

[論文レビュー] A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

Kaiwen Zhou, Fanhua Shang|arXiv (Cornell University)|Jun 28, 2018

Stochastic Gradient Optimization Techniques参考文献 12被引用数 44

ひとこと要約

MiGは、最もよく知られた収束速度に匹敵する単純な確率的分散減少勾配法であり、効率的なスパースおよび非同期バリアントを備えています。強凸問題に対しては (n+√(κn)) log(1/ε)、非強凸問題に対しては 1/T^2 を達成します。

ABSTRACT

Recent years have witnessed exciting progress in the study of stochastic variance reduced gradient methods (e.g., SVRG, SAGA), their accelerated variants (e.g, Katyusha) and their extensions in many different settings (e.g., online, sparse, asynchronous, distributed). Among them, accelerated methods enjoy improved convergence rates but have complex coupling structures, which makes them hard to be extended to more settings (e.g., sparse and asynchronous) due to the existence of perturbation. In this paper, we introduce a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems. Moreover, we also present its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings. Finally, extensive experiments for various machine learning problems such as logistic regression are given to illustrate the practical improvement in both serial and asynchronous settings.

研究の動機と目的

Finite-sum convex optimization のための確率的分散減少勾配法の加速を動機づける。
内ループで追跡する変数ベクトルを1つだけにする、単純なアルゴリズム（MiG）を設計する。
強凸問題での最良のオラクル複雑度と、非強凸問題での最適なレートを達成する。
MiGをスパースおよび非同期設定へ拡張し、実用的な性能利得を得る。
連続・非同期シナリオでの効率性を実証するエンピリカルな証拠を提供する。

提案手法

オーバーヘッドを減らし、スパース/非同期設定への容易な拡張を可能にする単一の内ループ変数を持つMiGを導入する。
勾配推定量 tilde{∇} = ∇f_i_j(y_{j-1}) − ∇f_i_j(tilde{x}_{s-1}) + μ_s を用いる、ここで μ_s = ∇f(tilde{x}_{s-1}).
yを x と tilde{x} の theta 重み付き結合として計算する、すなわち y_{j-1} = θ x^{s}_{j-1} + (1−θ) tilde{x}_{s-1}。
x^{s}_{j}を prox 近傍ステップ min_x { (1/2η)||x−x^{s}_{j-1}||^2 + ⟨tilde{∇}, x⟩ + g(x) } で更新する。
反復を集合化して tilde{x}_s を内側反復の θ 重み付き平均として形成する。
対角再重み付け D を用いて unbiased な勾配推定を維持し、1ベクトル更新構造を保つことで、スパースおよび非同期バリアントを提供する。

実験結果

リサーチクエスチョン

RQ1確率的分散減少法において、単一の結合ベクトルを更新するだけで加速を達成できるか。
RQ2MiGが強凸および非強凸問題に対して既存法と比較して得られるオラクル複雑度は何か。
RQ3MiGをスパースおよび非同期設定へ、収束保証を失うことなく拡張するにはどうすればよいか。
RQ4MiGは密集・スパース・非同期の状況で、最先端法と比較して実証上どのように性能を示すか。

主な発見

Algorithm	Complexity	Memory	S&A
SVRG	O((n+κ) log(1/ε))	1 Vec.	はい
SAGA	O((n+κ) log(1/ε))	1 Vec. 1 ∇ table	はい
Katyusha	O((n+√(κn)) log(1/ε))	2 Vec.	いいえ
MiG	O((n+√(κn)) log(1/ε))	1 Vec.	はい

MiGは強凸問題に対して最もよく知られたオラクル複雑度を達成する：O((n+√(κn)) log(1/ε)).
非強凸問題では、MiG NSCは最適な O(1/T^2) レートを実現する。
MiGは内ループで単一ベクトルを維持し、スパースおよび非同期バリアントを実装可能にして実用的な性能向上を実現する。
実験では、MiGは密集設定で Katyusha および SVRG に匹敵または上回り、スパース/非同期設定の関連データセットでは KroMagnon および ASAGA を上回る。
MiGは勾配テーブルを必要とせず、実装を簡素化し、分散または非同期環境へ容易に拡張できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。