QUICK REVIEW

[論文レビュー] Convex Optimization with Nonconvex Oracles.

Oren Mangoubi, Nisheeth K. Vishnoi|arXiv (Cornell University)|Nov 7, 2017

Stochastic Gradient Optimization Techniques被引用数 1

ひとこと要約

この論文は、凸目的関数 $F$ を最小化する多項式時間アルゴリズムを提示する。ただし、ノイズが付加された非凸的近似 $ hat{F}$ のみが利用可能であり、一般のノイズモデル $|F(x) - hat{F}(x)| \leq \alpha F(x) + \beta$ の下で動作する。本手法は温度を減少させる確率的勾配ラングジュー・マルコフ連鎖を適応し、無限大のノイズに対しても収束を保証する。一般の凸関数 $F$ に対して非漸近的保証が得られ、従来のより強い仮定を必要とする結果を一般化する。

ABSTRACT

In machine learning and optimization, one often wants to minimize a convex objective function $F$ but can only evaluate a noisy approximation $\hat{F}$ to it. Even though $F$ is convex, the noise may render $\hat{F}$ nonconvex, making the task of minimizing $F$ intractable in general. As a consequence, several works in theoretical computer science, machine learning and optimization have focused on coming up with polynomial time algorithms to minimize $F$ under conditions on the noise $F(x)-\hat{F}(x)$ such as its uniform-boundedness, or on $F$ such as strong convexity. However, in many applications of interest, these conditions do not hold. Here we show that, if the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$, then there is a polynomial time algorithm to find an approximate minimizer of $F$. In particular, our result allows for unbounded noise and generalizes those of Applegate and Kannan, and Zhang, Liang and Charikar, who proved similar results for the bounded noise case, and that of Belloni et al. who assume that the noise grows in a very specific manner and that $F$ is strongly convex. Turning our result on its head, one may also view our algorithm as minimizing a nonconvex function $\hat{F}$ that is promised to be related to a convex function $F$ as above. Our algorithm is a modification of the stochastic gradient Langevin Markov chain and gradually decreases the temperature of the chain to approach the global minimizer. Analyzing such an algorithm for the unbounded noise model and a general convex function turns out to be challenging and requires several technical ideas that might be of independent interest in deriving non-asymptotic bounds for other simulated annealing based algorithms.

研究の動機と目的

凸関数 $F$ を最小化する課題に取り組むが、利用可能なオラクルはノイズが付加された非凸的近似 $\hat{F}$ のみである。
従来の結果を、ノイズが有界であるか、$F$ が強い凸性を満たすという強い仮定を必要とするものから、より広いクラスのノイズモデルへ一般化する。
ノイズが $\alpha F(x) + \beta$ の形で無限大である場合の、非有界ノイズに対して、証明可能な効率性を持つアルゴリズムを開発する。
一般の凸性とノイズ条件の下で、シミュレーテッド・アニーリング風のアルゴリズムに対する非漸近的収束保証を確立する。

提案手法

温度を減少させるスケジュールを用いた確率的勾配ラングジュー・マルコフ連鎖（SGLD）を適応し、段階的に $F$ のグローバル最小値の近傍に連鎖を集める。
$|F(x) - \hat{F}(x)| \leq \alpha F(x) + \beta$（$\alpha, \beta > 0$）という非有界ノイズを扱うための新規な解析フレームワークを導入。
ラングジュー動的法におけるドリフト項と拡散項を制御するため、リャプノフ関数の議論を用いる。
与えられたノイズモデル下で、$F$ の $\varepsilon$-近似最小値に多項式時間で収束することを確立。
探索を保証するのに十分にゆっくりと減少するが、収束を保証するのに十分に速い温度スケジュールを採用。
強い凸性に依存しない非漸近的期待値の最適性ギャップの境界を導出。

実験結果

リサーチクエスチョン

RQ1ノイズが付加され、非凸的である可能性のある近似オラクル $\hat{F}$ のみが利用可能な場合、凸関数 $F$ を最小化することは可能か？
RQ2ノイズ $F(x) - \hat{F}(x)$ にどのような条件が課されると、$F$ の近似最小値への多項式時間収束が可能になるか？
RQ3確率的勾配ラングジュー動的法は、非有界ノイズと一般の凸関数 $F$ の下で適応および解析可能か？
RQ4温度スケジュールは、乗法的および加法的ノイズの下で収束にどのように影響を与えるか？

主な発見

ノイズモデル $|F(x) - \hat{F}(x)| \leq \alpha F(x) + \beta$ の下で、アルゴリズムは多項式時間で $F$ の $\varepsilon$-近似最小値に到達する。
従来の有界ノイズや強い凸性を仮定する結果を一般化し、$F$ やノイズ構造に対する制限的な仮定を排除する。
非漸近的収束境界が、ノイズが非有界であってもラングジューに基づくアルゴリズムに対して確立される。
解析により、他のシミュレーテッド・アニーリングや確率的最適化アルゴリズムに適用可能な技術的知見が得られる。
$\hat{F}$ が非凸であっても、$F$ の元の凸性を活用することで、$F$ を効果的に最小化することができる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。