QUICK REVIEW

[论文解读] The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Micah Goldblum, Marc Finzi|arXiv (Cornell University)|Apr 11, 2023

Computability, Logic, AI Algorithms被引用 14

一句话总结

本文推导了一个基于 Kolmogorov-complexity 的无自由午餐定理，证明现实世界的数据和神经网络偏好低复杂度解，并主张通过归纳偏置和跨域的 PAC-Bayes 边界实现统一学习。

ABSTRACT

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity. Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains. Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences. Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.

研究动机与目标

在 ML 中激发归纳思维，并将其与现实世界数据结构及 NFL 定理中的均匀噪声假设进行对比。
推导一个基于 Kolmogorov-复杂性的 NFL 定理，以解释为何在实践中学习是可行的。
证明现实数据集和神经网络在跨领域中表现出低复杂度偏好。
展示跨领域的 PAC-Bayes 边界如何解释泛化并支持统一学习方法。

提出的方法

使用 Kolmogorov 复杂性下的不可压缩性推导一个新的 NFL 定理。
使用压缩（如 bzip2）来界定数据集的 K(x) 和 K(Y|X)。
用负对数似然和模型规模来表示 K(Y|X)，以表明压缩意味着可学习性。
通过对表格数据和图像域的标签进行压缩来演示神经网络中的简约偏好。
应用一种简单的基于 Kolmogorov 的语言来衡量生成序列的复杂度（用于 GPT-3）。
将表格数据重整为图像，以测试卷积神经网络在跨域泛化界限上的表现。
给出与数据集可压缩性和边缘似然相关的 PAC-Bayes 风格的泛化界限。

实验结果

研究问题

RQ1现实世界的数据集是否表现出可压缩性，从而解释尽管存在 NFL 定理，机器学习仍能成功泛化？
RQ2神经网络和大型语言模型是否在跨领域偏好低 Kolmogorov-复杂度的解？
RQ3跨域的 PAC-Bayes 边界是否能够在模型超出原生领域使用时解释泛化（例如 CNN 在表格数据上）？

主要发现

现实数据集高度可压缩，与均匀随机数据不可压缩形成对比。
神经网络对标注函数进行压缩，意味着一个与模型似然相关的非平凡 K(Y|X) 界。
存在一个 Kolmogorov 样式的 NFL 定理，表明在可压缩数据上可以学习，在不可压缩数据上则不可。
GPT-3 及更大规模的模型对更简单的序列（低 Kolmogorov 复杂度）给出指数级更高的概率。
在人工编码的表格数据上训练的 CNN 能很好地泛化，这是由强烈的简约偏好所驱动的，PAC-Bayes 压缩界限显示。
单一模型家族就能在多种问题上表现良好，与低复杂度的归纳偏置相一致，减少了对领域特定模型的需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。