QUICK REVIEW

[论文解读] Implicit ridge regularization provided by the minimum-norm least squares estimator when $n\ll p$

Dmitry Kobak, Jonathan Lomond|arXiv (Cornell University)|May 28, 2018

Neural Networks and Applications被引用 1

一句话总结

该论文表明，在高维线性回归中当 $ n \ll p $ 时，最小范数最小二乘估计器由于预测变量方向的低方差，隐式地提供了岭回归正则化，因此显式的正岭回归惩罚项无效甚至有害。当高方差方向能预测响应变量时，最优岭惩罚可能为负，这一结论通过模拟、真实数据以及一个尖刺协方差模型的证明得到验证。

ABSTRACT

A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. Here we show that this rule can be violated by linear regression in the underdetermined $n\ll p$ situation under realistic conditions. Using simulations and real-life high-dimensional data sets, we demonstrate that an explicit positive ridge penalty can fail to provide any improvement over the minimum-norm least squares estimator. Moreover, the optimal value of ridge penalty in this situation can be negative. This happens when the high-variance directions in the predictor space can predict the response variable, which is often the case in the real-world high-dimensional data. In this regime, low-variance directions provide an implicit ridge regularization and can make any further positive ridge penalty detrimental. We prove that augmenting any linear model with random covariates and using minimum-norm estimator is asymptotically equivalent to adding the ridge penalty. We use a spiked covariance model as an analytically tractable example and prove that the optimal ridge penalty in this case is negative when $n\ll p$.

研究动机与目标

挑战一种普遍观点，即在高维设定下，大模型总是需要强显式正则化。
研究最小范数最小二乘估计器（MoNLS）在欠定线性模型（$n \ll p$）中是否提供隐式正则化。
确定显式岭惩罚无法提升性能甚至可能降低性能的条件。
形式化建立添加随机协变量与通过MoNLS估计器应用岭惩罚之间的渐近等价性。
在 $n \ll p$ 条件下，推导并分析尖刺协方差模型中的最优岭惩罚。
method

提出的方法

本研究通过模拟和真实高维数据集，比较最小范数最小二乘估计器（MoNLS）与岭正则化回归的性能。
引入尖刺协方差模型作为可处理的分析框架，以研究高维设定下岭惩罚的行为。
论文证明，通过在模型中增加独立的随机协变量并应用MoNLS估计器，其渐近等价于施加岭惩罚。
理论分析推导出尖刺协方差模型中的最优岭惩罚，表明当 $n \ll p$ 时该惩罚可能为负。
分析聚焦于预测变量方差结构与响应可预测性之间的相互作用，特别是高方差方向对预测的贡献。
理论结果通过数值实验得到支持，表明当MoNLS已提供隐式正则化时，正的岭惩罚反而会降低性能。
research_questions

实验结果

研究问题

RQ1在高维线性模型中，当 $n \ll p$ 时，最小范数最小二乘估计器在何种条件下提供隐式岭正则化？
RQ2在 $n \ll p$ 范围内，最优岭惩罚是否可能为负？如果是，这种现象在何种数据条件下发生？
RQ3为何当已使用最小范数估计器时，显式的正岭正则化无法提升性能？
RQ4预测变量方向的方差结构如何影响高维回归中岭惩罚的有效性？
RQ5添加随机协变量与通过最小范数估计器应用岭惩罚之间存在何种渐近等价性？
RQ6key_findings

主要发现

在高维设定下，当 $n \ll p$ 时，最小范数最小二乘估计器通过低方差预测变量方向隐式地提供岭正则化。
当预测变量空间中存在能预测响应变量的高方差方向时，显式的正岭惩罚可能降低性能。
在尖刺协方差模型中，当 $n \ll p$ 时，最优岭惩罚为负，表明在此类条件下正则化反而有害。
通过最小范数估计器，建立了添加随机协变量与应用岭惩罚之间的渐近等价性，为隐式正则化提供了理论基础。
模拟和真实数据分析证实，当预测变量空间包含能预测响应的高方差分量时，MoNLS优于岭回归。
正的岭惩罚失效的原因在于MoNLS已通过降低高方差方向的权重实现正则化，因此进一步施加正惩罚反而适得其反。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。