QUICK REVIEW

[논문 리뷰] A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Prashant Khanduri, Siliang Zeng|arXiv (Cornell University)|2021. 02. 15.

Stochastic Gradient Optimization Techniques인용 수 26

한 줄 요약

SUSTAIN은 강하게 볼록한 하위 레벨을 가진 확률적 이중 수준 최적화를 위한 단일 루프, 모멘텀 보조 알고리즘으로, 비_convex 상위 objective에 대해 O(ε^{-3/2}) 반복 복잡도와 비용이 많이 드는 해essian 역행렬 역식을 필요로 하지 않는 단일 수준 SGD 속도에 부합합니다.

ABSTRACT

This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {\aname}~requires $\mathcal{O}(ε^{-3/2})$ iterations (each using ${\cal O}(1)$ samples) to find an $ε$-stationary solution. The $ε$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $ε$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.

연구 동기 및 목표

하위 수준 문제가 강하게 볼록하고 상위 수준 목표가 매끄럽도록 하는 확률적 이중 수준 최적화를 동기화하고 다룹니다.
내부 및 외부 그래디언트를 효율적으로 추적하기 위해 이중 모멘텀을 활용하는 단일 루프 알고리즘을 개발합니다.
비용이 많이 드는 해 Hessian 역행을 피하면서 단일 수준 문제와 비슷한 거의 최적의 확률적 복잡도를 달성합니다.
비정convex 및 강하게 볼록한 외부 목적에 대한 이론적 보장을 제공하고 계산 규모의 우호적 특성을 시연합니다.

제안 방법

SUSTAIN을 단일 타임스케일, 이중 모멘텀 확률적 근사 알고리즘으로 도입합니다.
하위 레벨 그래디언트 ∇_y g와 외부 그래디언트 ∇ℓ에 대해 모멘텀 기반의 그래디언트 추정기를 사용하고 명시적 내부 루프 해를 피합니다.
편향된 실질적인 그래디언트 대리 ĥ∇f를 암시 함수 정리와 K 스텝 Hessian 역행 프리 구성에 기반하고 K에서 지수적 편향 감소를 사용합니다.
업데이트 규칙: y_{t+1} = y_t − β_t h_t^g 및 x_{t+1} = x_t − α_t h_t^f, 여기서 h_t^g와 h_t^f는 재귀 모멘텀 추정기(방정식(13) 및(14))입니다.
샘플 기반의 Hessian-벡터 곱을 사용하여 그래디언트 추정기를 구성합니다; 편향 제어를 위해 K = Θ(log T)로 설정합니다(레마 2.1).
잠재력 함수를 활용하여 그래디언트 추정 오차 및 최적성 격차를 포함한 수렴을 보이고, ε-정지점에 대해 O(ε^{-3/2}) 반복/샘플 복잡도(정리 3.2)와 강하게 볼록한 외부의 경우 O(ε^{-1})를 보이는(Theorem 3.3) 것을 보인다.

실험 결과

연구 질문

RQ1단일 루프 이중 수준 확률적 최적화기가 비용이 많이 드는 Hessian 역행 없이 거의 최적의 샘플 복잡도에 도달할 수 있나요?
RQ2내부 및 외부 문제에 대한 그래디언트 추정기를 모멘텀을 사용해 어떻게 구성하고 안정화하여 수렴을 보장할 수 있나요?
RQ3표준 매끄러움/강볼록성 가정하에 비-볼록 및 강하게 볼록한 외부 목표의 반복 및 샘플 복잡도는 어떻게 되나요?
RQ4SUSTAIN이 이론(속도) 및 계산(단일 반복당 비용) 측면에서 기존 이중 수준 방법과 어떻게 비교되나요?

주요 결과

비볼록 외부 목표의 경우 SUSTAIN은 ε-정지점 정의(정의 1.1)를 찾기 위한 O(ε^{-3/2}) 반복을 달성합니다.
각 반복에 O(1) 샘플 및 매 반복 O(d_lo^2 log T)의 비용으로 Hessian 역행을 피합니다.
강하게 볼록한 외부 목표 설정에서 SUSTAIN은 ε-최적성을 달성하기 위한 O(ε^{-1}) 확률적 그래디언트 샘플을 달성합니다(정리 3.3).
외부 목표에 대한 그래디언트 추정기는 명시적 Hessian 역행이 필요 없으며, 리프시츠 성질과 모멘텀을 활용해 편향/분산을 제어합니다(렘 3.1).
이전 이중 수준 방법(BSA, stocBiO, TTSA, STABLE, SVRB)과 비교하여 SUSTAIN은 샘플 복잡도에서 동등하거나 향상되면서 계산 비용을 감소시킵니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.