Skip to main content
QUICK REVIEW

[Paper Review] Optimization with First-Order Surrogate Functions

Julien Mairal|arXiv (Cornell University)|May 14, 2013
Sparse and Compressive Sensing Techniques40 references117 citations
TL;DR

This paper introduces a unified framework for first-order surrogate-based optimization, unifying and extending methods like accelerated proximal gradient, block coordinate descent, and Frank-Wolfe. It proposes a novel incremental algorithm, MISO, which achieves linear convergence for strongly convex problems and outperforms state-of-the-art solvers in large-scale machine learning tasks such as ℓ1- and ℓ2-regularized logistic regression.

ABSTRACT

In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function. By proposing several algorithmic variants and simple convergence analyses, we make two main contributions. First, we provide a unified viewpoint for several first-order optimization techniques such as accelerated proximal gradient, block coordinate descent, or Frank-Wolfe algorithms. Second, we introduce a new incremental scheme that experimentally matches or outperforms state-of-the-art solvers for large-scale optimization problems typically arising in machine learning.

Motivation & Objective

  • To unify diverse first-order optimization techniques under a single surrogate-based framework.
  • To develop a new incremental optimization scheme with strong theoretical convergence guarantees.
  • To analyze convergence for both non-convex and convex problems using first-order surrogate functions.
  • To empirically evaluate the proposed method on large-scale machine learning problems.

Proposed method

  • Proposes first-order surrogate functions that majorize the objective and have L-Lipschitz continuous gradients for the approximation error.
  • Introduces a generic majorization-minimization algorithm that iteratively minimizes surrogate functions.
  • Derives a randomized block coordinate descent variant with convergence guarantees.
  • Presents an accelerated variant inspired by Nesterov's method for convex problems.
  • Develops a new incremental scheme, MISO, designed for large-scale problems with linear convergence.
  • Establishes theoretical convergence rates: linear for strongly convex problems, asymptotic stationarity for non-convex cases.

Experimental results

Research questions

  • RQ1Can a unified framework be developed to analyze diverse first-order optimization methods using surrogate functions?
  • RQ2Does the proposed MISO algorithm achieve linear convergence for strongly convex problems?
  • RQ3How does MISO compare to state-of-the-art solvers like SAG and SDCA in practice?
  • RQ4Can the surrogate-based approach be extended to incremental and block-coordinate settings with theoretical guarantees?
  • RQ5What is the empirical performance of MISO on large-scale logistic regression tasks?

Key findings

  • MISO achieves linear convergence for strongly convex problems, matching the best-known rates of SAG and SDCA.
  • In experiments, MISO matches or outperforms state-of-the-art solvers (e.g., FISTA, LIBLINEAR, SAG) on ℓ2- and ℓ1-regularized logistic regression.
  • For ℓ2-regularized logistic regression with λ = 10−3 and λ = 10−7, MISO converges faster than FISTA, LIBLINEAR, and ASGD in terms of effective data passes.
  • On sparse problems with ~3% nonzero coefficients, MISO outperforms FISTA and LIBLINEAR, demonstrating robustness to sparsity.
  • For problems with ~50% nonzero coefficients, MISO remains competitive, showing consistent performance across sparsity levels.
  • The theoretical analysis confirms linear convergence for MISO and other variants under strong convexity, with explicit bounds on suboptimality.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.