QUICK REVIEW

[Paper Review] Consistency of the group Lasso and multiple kernel learning

Francis Bach|ArXiv.org|Jul 23, 2007

Statistical Methods and Inference49 references704 citations

TL;DR

This paper establishes theoretical consistency conditions for the group Lasso and multiple kernel learning (MKL) in high-dimensional regression, showing that sparsity patterns can be consistently recovered under practical assumptions, including model misspecification. It extends Lasso consistency results to grouped and infinite-dimensional kernel settings using covariance operators and functional analysis.

ABSTRACT

We consider the least-square regression problem with regularization by a block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

Motivation & Objective

To establish necessary and sufficient conditions for model consistency of the group Lasso under practical assumptions, including model misspecification.
To extend consistency results from finite-dimensional group Lasso to infinite-dimensional multiple kernel learning (MKL) using reproducing kernel Hilbert spaces.
To propose an adaptive scheme that ensures consistency even when standard conditions for non-adaptive MKL are violated.
To provide a theoretical foundation for group selection and non-linear variable selection in heterogeneous data fusion and kernel learning.
To unify the analysis of group Lasso and MKL through functional analysis, particularly using covariance operators in input space.

Proposed method

Uses covariance operators to analyze the group Lasso and MKL in primal space, avoiding reliance on dual space computations.
Applies tools from functional analysis to extend finite-dimensional group Lasso consistency to infinite-dimensional RKHS settings.
Derives necessary and sufficient conditions for consistency based on spectral properties of the covariance operator and group structure.
Introduces an adaptive group Lasso scheme with data-dependent weights to ensure consistency even when standard conditions fail.
Employs orthonormal bases in $L^2(p_X)$ and eigen-decomposition of the covariance operator $\Sigma_{XX}$ to characterize the solution space.
Uses Hermite polynomial expansions and Gaussian kernel eigenbases to compute expectations and verify conditions analytically in the Gaussian case.

Experimental results

Research questions

RQ1Under what conditions is the group Lasso consistent in recovering the true sparsity pattern of the regression coefficients?
RQ2How does model misspecification affect the consistency of the group Lasso, and can consistency still be achieved?
RQ3Can the consistency results of the group Lasso be extended to the infinite-dimensional setting of multiple kernel learning?
RQ4What conditions ensure consistency in multiple kernel learning when the standard group Lasso conditions are not satisfied?
RQ5How can an adaptive scheme improve consistency in multiple kernel learning under weak or violated assumptions?

Key findings

The group Lasso is consistent in recovering the true sparsity pattern if and only if a certain correlation condition on the design matrix and groups is satisfied, even under model misspecification.
In the presence of strong correlations within groups, the standard group Lasso may fail to recover the correct sparsity pattern, but an adaptive version with data-dependent weights restores consistency.
For multiple kernel learning, consistency is achieved when the true function lies in the span of the kernel functions and the kernel combination satisfies a representer condition tied to the covariance operator's spectral properties.
The paper proves that the group Lasso converges to the best linear predictor in $L^2(p_X)$, both in terms of coefficient vectors and sparsity patterns, under mild assumptions.
In the Gaussian kernel case, explicit eigenbases and Hermite polynomial expansions allow analytical verification of consistency conditions without Monte Carlo sampling.
The adaptive scheme ensures consistency even when the necessary condition for non-adaptive MKL is violated, by reweighting group norms based on initial estimates.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.