QUICK REVIEW

[논문 리뷰] A Randomized Block Proximal Variable Sample-size Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization

Jinlong Lei, Uday V. Shanbhag|arXiv (Cornell University)|2018. 08. 07.

Sparse and Compressive Sensing Techniques참고 문헌 34인용 수 1

한 줄 요약

이 논문은 블록을 순차적으로 업데이트하면서 점점 증가하는 배치 크기의 그래디언트를 사용하는 복합 비볼록 스토하스틱 최적화를 위한 랜덤화된 블록 프락시멀 변수 샘플 크기 확률적 그래디언트(VSSG) 방법을 제안한다. 그래디언트 매핑에 대해 $Ó(1/K)$ 수렴성을 확립하였으며, $É$-정류점에 도달하기 위한 반복 횟수 복잡도는 $Ó(1/ au)$, 오라클 복잡도는 $Ó(1/ au^2)$이다. $µ$-프락시멀 폴랴크-에오자시에프 조건 하에서는 기하 수렴을 달성한다.

ABSTRACT

This paper considers the minimization of a sum of an expectation-valued smooth nonconvex function and a nonsmooth block-separable convex regularizer. By combining a randomized block-coordinate descent method with a proximal variable sample-size stochastic gradient (VSSG) method, we propose a randomized block proximal VSSG algorithm. In each iteration, a single block is randomly chosen to updates its estimates by {a VSSG scheme} with an increasing batch of sampled gradients, while the remaining blocks are kept invariant. By appropriately chosen batch sizes, we prove that every limit point for almost every sample path is a stationary point when blocks are chosen either randomly or cyclically. We further show that the ergodic mean-squared error of the gradient mapping {diminishes at the rate of $\mathcal{O}(1/K) $ where $K$denotes the iteration index} and establish that the iteration and oracle complexity to obtain an $\epsilon$-stationary point are $\mathcal{O}(1/\epsilon )$ and $\mathcal{O}(1/\epsilon^2)$, respectively. Furthermore, under a $ {\mu}$-proximal Polyak-{\L}ojasiewicz condition with the batch size increasing at a suitable geometric rate, we prove that the suboptimality diminishes at a {\em geometric} rate, the {\em optimal} deterministic rate. In addition, if $L_{ m ave}$ denotes the average of block-specific Lipschitz constants, the iteration and oracle complexity to obtain an $\epsilon$-optimal solution are $\mathcal{O}( {(L_{ m ave}/\mu)}\ln(1/\epsilon))$ and $\mathcal{O}\left( (1/\epsilon)^{1+c} ight)$, respectively, {matching} the deterministic result. When $n=1$, we obtainthe {\em optimal} ed{oracle complexity bound} $\mathcal{O}(1/\epsilon) $ while $c>0$ when $n\geq 2$ represents the positive cost of multiple blocks. Finally, preliminary numerical experiments support our theoretical findings.

연구 동기 및 목표

부드럽지만 비볼록인 기대 함수와 블록 분리형 볼록 정규화항을 포함하는 복합 비볼록 스토하스틱 최적화 문제를 다루는 것.
변수 샘플 크기와 블록 별 업데이트를 처리할 수 있으며 최적 수렴 속도를 달성하는 스토하스틱 1차 방법을 개발하는 것.
랜덤 또는 순환적 블록 선택 하에서 극한점 수렴 보장과 평균 제곱 오차 감소를 확보하는 것.
다양한 조건 하에서 $É$-정류점 및 $É$-최적 해에 도달하기 위한 반복 및 오라클 복잡도를 분석하는 것.

제안 방법

랜덤화된 블록 코ORDINATE 디센트 프레임워크와 프락시멀 변수 샘플 크기 스토하스틱 그래디언트(VSSG) 기법을 통합한다.
각 반복에서 하나의 블록이 무작위로 선택되고, 증가하는 샘플 그래디언트 배치 크기를 사용하는 VSSG 단계로 업데이트된다.
각 업데이트 동안 나머지 블록들은 고정되어 있어 분산 감소를 보장하는 블록 별 최적화가 가능하다.
수렴을 보장하고 $µ$-프락시멀 폴랴크-에오자시에프 조건 하에서 최적 속도를 달성하기 위해 배치 크기를 기하급수적으로 증가시킨다.
비미분 가능 볼록 정규화항을 처리하기 위해 프락시멀 연산자를 사용하여 블록 별 업데이트의 계산 가능성을 보장한다.
그래디언트 매핑과 에르고딕 평균을 통해 수렴성을 분석하고, 평균 제곱 오차 및 부분 최적화 오차에 대한 이론적 경계를 도출한다.

실험 결과

연구 질문

RQ1랜덤화된 블록 프락시멀 VSSG 방법이 그래디언트 매핑의 평균 제곱 오차에 대해 $Ó(1/K)$ 수렴성을 달성할 수 있는가?
RQ2제안된 방법이 $É$-정류점에 도달하기 위한 반복 및 오라클 복잡도는 각각 얼마인가?
RQ3$µ$-프락시멀 폴랴크-에오자시에프 조건 하에서 기하 수렴이 발생하는가, 만약 그렇다면 수렴 속도는 얼마인가?
RQ4블록 수 $n \geq 2$에 따라 복잡도는 어떻게 스케일링되며, 다수의 블록에 따른 비용은 무엇인가?
RQ5$n=1$일 때 최적의 $Ó(1/\epsilon)$ 오라클 복잡도를 달성할 수 있으며, $n\geq 2$일 때는 어떻게 스케일링되는가?

주요 결과

에르고딕 평균 제곱 오차는 반복 횟수 $K$에 대해 $Ó(1/K)$ 속도로 감소한다.
반복 복잡도는 $É$-정류점에 도달하기 위해 $Ó(1/\tau)$이며, 오라클 복잡도는 $Ó(1/\tau^2)$이다.
$µ$-프락시멀 폴랴크-에오자시에프 조건 하에서 기하급수적으로 증가하는 배치 크기를 사용할 경우, 부분 최적화 오차는 기하 수렴 속도로 감소하며, 이는 최적의 결정론적 수렴 속도와 일치한다.
$n=1$일 경우 오라클 복잡도는 최적의 $Ó(1/\tau)$ 경계를 달성하며, $n\geq 2$일 경우 $c>0$는 다수의 블록에 따른 비용을 반영한다.
$É$-최적 해에 대해서는 반복 복잡도가 $Ó\left(\frac{L_{\text{m ave}}}{\mu}\ln(1/\tau)\right)$이며, 오라클 복잡도는 $Ó\left(\left(\frac{1}{\tau}\right)^{1+c}\right)$이다. 이는 결정론적 결과와 일치한다.
초기 수치 실험 결과는 이론적 수렴 속도와 복잡도 경계를 지지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.