QUICK REVIEW

[论文解读] Convex Optimization with Nonconvex Oracles.

Oren Mangoubi, Nisheeth K. Vishnoi|arXiv (Cornell University)|Nov 7, 2017

Stochastic Gradient Optimization Techniques被引用 1

一句话总结

本文提出了一种多项式时间算法，用于在仅能获得噪声性、可能非凸的近似函数 $\hat{F}$ 的情况下最小化凸目标函数 $F$，在一般噪声模型 $|F(x) - \hat{F}(x)| \leq \alpha F(x) + \beta$ 下成立。该方法通过采用温度递减的随机梯度朗之万马尔可夫链，即使在噪声无界时也能实现收敛，并为一般凸 $F$ 提供了非渐近性保证，推广了先前在更强假设下获得的结果。

ABSTRACT

In machine learning and optimization, one often wants to minimize a convex objective function $F$ but can only evaluate a noisy approximation $\hat{F}$ to it. Even though $F$ is convex, the noise may render $\hat{F}$ nonconvex, making the task of minimizing $F$ intractable in general. As a consequence, several works in theoretical computer science, machine learning and optimization have focused on coming up with polynomial time algorithms to minimize $F$ under conditions on the noise $F(x)-\hat{F}(x)$ such as its uniform-boundedness, or on $F$ such as strong convexity. However, in many applications of interest, these conditions do not hold. Here we show that, if the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$, then there is a polynomial time algorithm to find an approximate minimizer of $F$. In particular, our result allows for unbounded noise and generalizes those of Applegate and Kannan, and Zhang, Liang and Charikar, who proved similar results for the bounded noise case, and that of Belloni et al. who assume that the noise grows in a very specific manner and that $F$ is strongly convex. Turning our result on its head, one may also view our algorithm as minimizing a nonconvex function $\hat{F}$ that is promised to be related to a convex function $F$ as above. Our algorithm is a modification of the stochastic gradient Langevin Markov chain and gradually decreases the temperature of the chain to approach the global minimizer. Analyzing such an algorithm for the unbounded noise model and a general convex function turns out to be challenging and requires several technical ideas that might be of independent interest in deriving non-asymptotic bounds for other simulated annealing based algorithms.

研究动机与目标

解决当仅能访问到一个噪声性、可能非凸的近似函数 $\hat{F}$ 时，最小化凸函数 $F$ 的挑战。
将先前要求噪声有界或 $F$ 强凸的结论推广到更广泛的噪声模型类别。
为形式为 $\alpha F(x) + \beta$ 的无界噪声开发一个可证明高效的算法。
在一般凸性和噪声条件下，建立模拟退火类算法的非渐近收敛保证。

提出的方法

将随机动态梯度朗之万马尔可夫链（SGLD）与递减温度调度相结合，逐步使马尔可夫链集中在 $F$ 的全局最小值附近。
提出一种新颖的分析框架，以处理预言机中无界的噪声，其中 $|F(x) - \hat{F}(x)| \leq \alpha F(x) + \beta$，且 $\alpha, \beta > 0$。
使用李雅普诺夫函数方法控制在噪声梯度下朗之万动力学中的漂移项和扩散项。
在给定的噪声模型下，以多项式时间实现对 $F$ 的 $\varepsilon$-近似最小值的收敛。
采用足够慢的温度调度以确保充分探索，同时足够快以保证收敛。
推导出与强凸性无关的期望次优间隙的非渐近界。

实验结果

研究问题

RQ1当唯一可用的预言机是具有无界噪声的噪声性、非凸近似 $\hat{F}$ 时，能否最小化凸函数 $F$？
RQ2噪声 $F(x) - \hat{F}(x)$ 需满足何种条件，才能实现对 $F$ 的近似最小值的多项式时间收敛？
RQ3能否对随机梯度朗之万动力学进行适配与分析，以应对无界噪声和一般凸 $F$？
RQ4温度调度在存在乘法和加法噪声时如何影响收敛性？

主要发现

在噪声模型 $|F(x) - \hat{F}(x)| \leq \alpha F(x) + \beta$ 下，该算法以多项式时间实现对 $F$ 的 $\varepsilon$-近似最小值。
该结果推广了先前在有界噪声或强凸性假设下的工作，消除了对 $F$ 和噪声结构的限制性假设。
即使噪声无界，也建立了基于朗之万算法的非渐近收敛界。
分析揭示了可应用于其他模拟退火和随机优化算法的技术洞见。
即使 $\hat{F}$ 非凸，该方法仍能成功最小化 $F$，通过利用 $F$ 的底层凸性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。