QUICK REVIEW

[論文レビュー] Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function

Peter Richtárik, Martin Takáč|arXiv (Cornell University)|Jul 14, 2011

Sparse and Compressive Sensing Techniques参考文献 27被引用数 18

ひとこと要約

本稿では、滑らかでない成分を含む合成凸関数を最小化するための確率的ブロック座標降下法を提案する。$ \epsilon $-精度を確率 $ 1-\rho $ 以上で達成するための反復複雑度を $ O(n/\epsilon \log(1/\rho)) $ として確立し、従来の研究に比べて未知の正則化パラメータに依存しなくなり、非ユークリッドノルムおよび任意の確率ベクトルへと拡張された。

ABSTRACT

In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an $ε$-accurate solution with probability at least $1-ρ$ in at most $O( frac{n}ε \log frac{1}ρ)$ iterations, where $n$ is the number of blocks. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper #2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing $ε$ from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale $\ell_1$-regularized least squares and support vector machine problems with a billion variables.

研究の動機と目的

滑らかでない成分を含む、ブロック分離可能な凸項からなる合成関数を最小化するための効率的な確率的ブロック座標降下法の開発。
$ \epsilon $-正確な解を高い確率で達成するためのタイトな反復複雑度バウンドの確立。
従来の研究で必要とされた未知のスケーリング要因を伴う正則化の必要性を排除し、真の反復複雑度を達成。
滑らかさのケースにおける任意の確率ベクトルおよび非ユークリッドノルムへの拡張。
10億変数問題におけるスケーラビリティの実証。例として $ \ell_1 $-正則化最小二乗問題および大規模サポートベクターマシン。

提案手法

アルゴリズムは反復的に1つの変数ブロックを更新し、一様にランダムに選択するか、指定された確率ベクトルに従って選択する。
選択されたブロックに対して、非滑らか成分に対してプロキシマルステップを、滑らか成分に対して勾配ステップを実行し、効率的に計算された偏微分を用いる。
グリーディ選択の計算負荷を避けるために確率的ブロック選択戦略を用いながらも、収束保証を維持する。
各反復における関数値の期待減少を分析し、強い凸性および勾配のリプシッツ連続性を活用する。
未知パrameterに依存しない新しい複雑度解析を導入し、よりタイトなバウンドを導出する。
任意の確率ベクトルおよび非ユークリッドノルムをサポートし、大規模設定における柔軟性を向上させる。

実験結果

リサーチクエスチョン

RQ1合成凸関数を最小化するための確率的ブロック座標降下法の反復複雑度は何か？
RQ2未知の正則化パrameterに依存せずに収束を達成できるか？
RQ3任意の確率ベクトルおよび非ユークリッドノルム下で、この手法はどのように動作するか？
RQ410億変数問題にスケーリング可能か？
RQ5$ \ell_1 $-正則化最小二乗問題およびサポートベクターマシンにおける実用的性能は何か？

主な発見

確率 $ 1-\rho $ 以上で $ \epsilon $-精度に到達するための反復回数は $ O(n/\epsilon \log(1/\rho)) $ であり、従来のバウンドを4倍改善し、対数項に $ \epsilon $ を含まない。
強い凸関数に対しては線形収束を示し、有利な状況での高速収束を確認した。
未知のスケーリング要因を伴う正則化を必要としないため、チューニングなしに真の反復複雑度バウンドを達成可能。
2989万の特徴量を有する kdd2010 データセットを用いた実験により、10億変数問題に対しても効果的にスケーリング可能であることを示した。
数値実験では、UCDCが10億変数問題に対して0.5秒未満で良好な解を発見し、座標の全走査を10回程度で高いテスト精度を達成した。
スパース設定でも効率的であり、1回の更新あたり $ O(o_i) $ の演算で済む。ここで $ o_i $ は特徴量 $ i $ の非ゼロ要素数である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。