QUICK REVIEW

[Paper Review] Does data interpolation contradict statistical optimality?

Mikhail Belkin, Alexander Rakhlin|arXiv (Cornell University)|Jun 25, 2018

Advanced Statistical Methods and Models10 references76 citations

TL;DR

The paper shows that interpolating estimators can achieve minimax-optimal rates for nonparametric regression and square-loss prediction under Hölder smoothness, challenging the belief that interpolation harms statistical performance.

ABSTRACT

We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.

Motivation & Objective

Motivate the puzzle that interpolation can yield good out-of-sample performance in modern learning settings.
Demonstrate that interpolating estimators can attain minimax optimal rates for nonparametric regression.
Establish finite-sample risk bounds for a class of singular-kernel Nadaraya-Watson estimators.
Show that interpolation does not preclude optimality in excess loss under standard assumptions.

Proposed method

Use a singular kernel K(u) = ||u||^{-a} I{||u|| ≤ 1} and variants to construct an interpolating estimator f_n.
Analyze the Nadaraya-Watson estimator with bandwidth h and derive risk bounds for f_n(X) under Holder smoothness f ∈ Σ(β,L).
Provide pointwise and integrated MSE bounds and prove they achieve the minimax rate n^{-2β/(2β+d)} under β ∈ (0,2].
Decompose error into bias and variance and bound each term under assumptions (A1)-(A2) and density regularity.
Balance bias-variance terms by choosing h = n^{-1/(2β+d)} to obtain the main rate.
Discuss extensions to other singular kernels and to under-specified models where the regression function lies in the Hölder class.

Experimental results

Research questions

RQ1Can an interpolating estimator achieve minimax-optimal rates for nonparametric regression under Hölder smoothness?
RQ2Do interpolating rules yield optimal excess loss in prediction with square loss when the regression function belongs to a Hölder class?
RQ3What conditions on the kernel, bandwidth, and density ensure optimal rates for interpolating estimators?
RQ4How do bias and variance behave for singular-kernel interpolants, and how should they be balanced?

Key findings

An interpolating estimator can achieve the classical minimax rate n^{-2β/(2β+d)} for estimating f in L2(P_X) when f ∈ Σ(β,L) with β ∈ (0,2].
Using a singular kernel with appropriate bandwidth yields finite-sample risk bounds that match minimax rates for β ∈ (0,2].
For β ∈ (1,2], the rate holds under an additional assumption p ∈ Σ(β−1,L_p) on the density, with p bounded away from zero on its support.
The integrated MSE E||f_n − f||^2_{L2(P_X)} is bounded by C n^{-2β/(2β+d)} under the stated conditions.
The interpolating estimator f_n is improper (its smoothness depends on n), yet it achieves optimal excess loss when the model is well-specified with f ∈ Σ(β,L).
Numerical illustrations indicate the interpolating kernel can produce sharp fits locally while remaining compatible with optimal rates.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.