QUICK REVIEW

[Paper Review] On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

Francis Bach|arXiv (Cornell University)|Feb 24, 2015

Mathematical Approximation and Integration63 references169 citations

TL;DR

This paper establishes a theoretical equivalence between kernel quadrature rules and random feature expansions, showing that optimal quadrature points can be derived as a special case of random feature sampling. It provides tight upper and lower bounds on approximation error that match up to logarithmic factors, based solely on kernel eigenvalues, and improves generalization guarantees in learning with Lipschitz-continuous losses by reducing required random features.

ABSTRACT

We show that kernel-based quadrature rules for computing integrals can be seen as a special case of random feature expansions for positive definite kernels, for a particular decomposition that always exists for such kernels. We provide a theoretical analysis of the number of required samples for a given approximation error, leading to both upper and lower bounds that are based solely on the eigenvalues of the associated integral operator and match up to logarithmic terms. In particular, we show that the upper bound may be obtained from independent and identically distributed samples from a specific non-uniform distribution, while the lower bound if valid for any set of points. Applying our results to kernel-based quadrature, while our results are fairly general, we recover known upper and lower bounds for the special cases of Sobolev spaces. Moreover, our results extend to the more general problem of full function approximations (beyond simply computing an integral), with results in L2- and L$\\infty$-norm that match known results for special cases. Applying our results to random features, we show an improvement of the number of random features needed to preserve the generalization guarantees for learning with Lipschitz-continuous losses.

Motivation & Objective

To establish a theoretical connection between kernel-based quadrature rules and random feature expansions for positive definite kernels.
To derive tight upper and lower bounds on the number of samples needed for a given approximation error in kernel quadrature.
To extend the analysis to full function approximation in $L_2$ and $L_∞$-norm, beyond just integral computation.
To improve generalization guarantees in supervised learning by reducing the number of random features required for Lipschitz-continuous losses.
To show that optimal quadrature points can be generated via i.i.d. sampling from a non-uniform distribution derived from kernel eigenvalues.

Proposed method

The analysis is grounded in functional analysis, using the eigen-decomposition of the integral operator associated with the kernel and measure.
The paper formulates kernel quadrature as a special case of random feature expansion using a specific decomposition that always exists for positive definite kernels.
It derives upper bounds by constructing a non-uniform sampling distribution from the kernel's eigenvalues, enabling i.i.d. sampling with optimal convergence rates.
Lower bounds are derived for any set of points, showing that no point configuration can achieve better error than the derived bound.
The framework is applied to both quadrature and function approximation, yielding $L_2$ and $L_\infty$ error bounds that match known results for special cases like Sobolev spaces.
For random features, the method improves generalization guarantees by reducing the number of features needed to preserve error bounds under Lipschitz-continuous losses.

Experimental results

Research questions

RQ1Can kernel quadrature rules be formally interpreted as a special case of random feature expansions?
RQ2What is the optimal number of samples required to achieve a given approximation error in kernel quadrature, and how does it scale with kernel properties?
RQ3Can the same theoretical framework yield tight error bounds for full function approximation, not just integral estimation?
RQ4How does the proposed method improve the number of random features needed to maintain generalization in supervised learning?
RQ5Is there a non-uniform sampling distribution that achieves the optimal upper bound for quadrature error?

Key findings

The paper establishes that kernel quadrature is a special case of random feature expansion, with a decomposition that always exists for positive definite kernels.
Upper and lower bounds on the number of samples for a given error match up to logarithmic factors, depending only on the eigenvalues of the kernel's integral operator.
The upper bound is achievable via i.i.d. sampling from a non-uniform distribution derived from the kernel's eigenvalues.
For Sobolev spaces, the derived bounds recover known convergence rates, such as $n^{-2}$ for $s=1$ and $n^{-4}$ for $s=2$, confirming consistency.
The framework extends to $L_2$ and $L_\infty$-norm approximations of functions, yielding bounds that match known results for special cases.
In random feature learning, the method reduces the number of features required to preserve generalization guarantees for Lipschitz-continuous losses.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.