Skip to main content
QUICK REVIEW

[论文解读] Reconciling modern machine learning practice and the bias-variance trade-off

Mikhail Belkin, Daniel Hsu|arXiv (Cornell University)|Dec 28, 2018
Machine Learning and Data Classification参考文献 38被引用 83
一句话总结

论文提出了双重下降风险曲线,解释随着模型容量超出插值点的增加如何降低测试风险,调和了经典偏差-方差理论与在神经网络、随机特征和集成方法中广泛存在的现代插值预测方法之间的关系。

ABSTRACT

Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.

研究动机与目标

  • 激发偏差-方差权衡与现代插值模型之间的表观不匹配的动机。
  • 提出并描述双重下降风险曲线,作为模型容量与泛化的统一框架。
  • 通过实验展示双重下降在神经网络、随机特征与集成方法中的普遍存在。
  • 提供关于驱动此行为的归纳偏置和优化动力学的见解。

提出的方法

  • 定义经典的偏差-方差框架以及插值阈值。
  • 将 Random Fourier Features 作为一个可控的模型类引入,用来研究容量。
  • 用平方损失对经验风险最小化 ERM 训练模型,并比较不同容量 (N),包括 N<n 与 N≥n。
  • 显示核/最小范数插值 (H_infty) 在超出插值后通常比有限-N 类具有更好的泛化。
  • 将观察扩展到神经网络和集成方法(AdaBoost、Random Forests),显示类似的双重下降曲线。
  • 提供直觉认为更大容量使得寻找更简单、范数更小的插值解,从而具有更好的泛化。

实验结果

研究问题

  • RQ1当容量超过插值阈值时,是否会出现双重下降风险曲线?
  • RQ2双重下降是否在如神经网络、随机特征和基于树的集成等模型类别中普遍存在?
  • RQ3在插值之外所看到的更好泛化背后的归纳偏置或范数是什么(例如最小范数解)?

主要发现

  • 双重下降泛化曲线:容量超过插值后测试风险先恶化再改善。
  • 最小范数插值解(或更平滑的平均/插值解)往往在插值之外提供更好的泛化,解释了第二次下降。
  • Random Fourier Features 实验在插值阈值 (N=n) 处显示峰值,N>n 时测试性能提升。
  • 神经网络,包括两层网络和多层结构,展示出定性上相似的双重下降模式,优化动力学影响可观测性。
  • 集成方法如 AdaBoost 和 Random Forests 当使用高度插值的树时也表现出双重下降,平均化有助于更平滑的泛化。
  • 核极限 (H_infty) 提供一个基准,常常优于有限-N 的随机特征模型,凸显跨阶段与最小范数插值的一致性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。