QUICK REVIEW

[Paper Review] From Averaging to Acceleration, There is Only a Step-size

Nicolas Flammarion, Francis Bach|arXiv (Cornell University)|Apr 7, 2015

Stochastic Gradient Optimization Techniques24 references66 citations

TL;DR

This paper unifies averaged gradient descent, accelerated gradient descent, and the heavy-ball method under a common second-order difference equation framework for non-strongly convex problems. It shows that convergence at the optimal O(1/n²) rate corresponds to stability of the system, and derives explicit stability conditions with sharp constants, enabling a hybrid algorithm that combines acceleration's fast convergence with averaging's robustness to noisy gradients.

ABSTRACT

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system , showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.

Motivation & Objective

To unify averaged gradient descent, accelerated gradient descent, and the heavy-ball method under a single mathematical framework for non-strongly convex problems.
To analyze the stability of these methods via eigenvalue analysis of a linear dynamical system, linking stability to O(1/n²) convergence.
To extend the analysis to noisy gradient settings, where gradients are random and zero-mean, and derive improved convergence guarantees.
To design a new hybrid algorithm that inherits the fast convergence of acceleration and the noise robustness of averaging by tuning step-sizes.

Proposed method

Reformulates averaged gradient descent, accelerated gradient descent, and the heavy-ball method as constant-parameter second-order difference equations with time-varying coefficients.
Analyzes the system using eigenvalue decomposition of the associated linear dynamical system, distinguishing between oscillatory and non-oscillatory behaviors.
Derives a sharp stability condition with explicit constants that guarantees O(1/n²) convergence rate for the excess risk.
Introduces a novel algorithm with adapted step-sizes that balances the benefits of averaging (robustness to noise) and acceleration (fast convergence).
Uses a weighted average of gradients in the update rule, with time-varying weights that depend on the iteration number and problem parameters.
Applies the framework to stochastic optimization, deriving a lower bound that confirms the optimality of the proposed step-size strategy under noisy gradients.

Experimental results

Research questions

RQ1Can averaged gradient descent, accelerated gradient descent, and the heavy-ball method be unified under a single second-order difference equation framework for non-strongly convex problems?
RQ2What is the precise stability condition for these methods that guarantees O(1/n²) convergence, and what are the explicit constants involved?
RQ3How does the presence of noisy gradients affect the convergence of these methods, and can a hybrid algorithm be designed to retain both fast convergence and robustness?
RQ4Is there a step-size strategy that combines the advantages of averaging (noise resilience) and acceleration (fast convergence) in stochastic settings?

Key findings

All three methods—averaged, accelerated, and heavy-ball—can be expressed as constant-parameter second-order difference equations, with convergence at rate O(1/n²) equivalent to system stability.
The stability condition is derived explicitly with sharp constants, enabling precise tuning of parameters for optimal convergence.
Eigenvalue analysis reveals distinct oscillatory and non-oscillatory behaviors depending on parameter choices, with implications for convergence speed and robustness.
In the presence of noisy gradients, the proposed hybrid algorithm achieves improved convergence by balancing step-sizes to maintain fast convergence while remaining robust to noise.
The method achieves a convergence rate of O(1/n²) under noise, matching the best-known rates for first-order methods in the non-strongly convex case.
A lower bound for stochastic least-squares optimization confirms that the proposed step-size strategy is optimal up to constants, with error bounded by Ω(V/(L√d N)) for N ≤ d.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.