QUICK REVIEW

[论文解读] Sparsity and Out-of-Distribution Generalization

Scott Aaronson, Lin Lin Lee|arXiv (Cornell University)|Mar 8, 2026

Machine Learning and Algorithms被引用 0

一句话总结

本文在 PAC 风格条件下给出稀疏假设与子空间 junta 如何实现 OOD 泛化的形式化条件，明确在学习特征上训练分布与测试分布重叠时的情形。

ABSTRACT

Explaining out-of-distribution generalization has been a central problem in epistemology since Goodman's "grue" puzzle in 1946. Today it's a central problem in machine learning, including AI alignment. Here we propose a principled account of OOD generalization with three main ingredients. First, the world is always presented to experience not as an amorphous mass, but via distinguished features (for example, visual and auditory channels). Second, Occam's Razor favors hypotheses that are "sparse," meaning that they depend on as few features as possible. Third, sparse hypotheses will generalize from a training to a test distribution, provided the two distributions sufficiently overlap on their restrictions to the features that are either actually relevant or hypothesized to be. The two distributions could diverge arbitrarily on other features. We prove a simple theorem that formalizes the above intuitions, generalizing the classic sample complexity bound of Blumer et al. to an OOD context. We then generalize sparse classifiers to subspace juntas, where the ground truth classifier depends solely on a low-dimensional linear subspace of the features.

研究动机与目标

将 OOD 泛化动机视为知识论与 AI 对齐问题。
提出稀疏性作为学习中朴素原则的 principled、基底不变的 Occam’s Razor 概念。
将子空间 junta 作为稀疏假设的基底鲁棒泛化。
给出 PAC 风格定理，量化在特征/子空间重叠时的 OOD 传递成功条件。
将稀疏性/子空间概念与 VC-维界及半代数函数类的有限界联系起来。

提出的方法

将世界建模为区分性特征，并将稀疏性形式化为对至多 k 个特征的依赖（k-稀疏假设）。
定义并分析 k-稀疏假设类及在对特征子集求并集时的 VC-维界。
证明 PAC 风格定理（定理 3-4），在训练与测试分布在学习到的特征上匹配时实现 OOD 传递。
将稀疏性推广到子空间 junta，此时对低维子空间 via 线性映射 W 的依赖，推导类似的传递保证（定理 5-6）。
在半代数函数类下讨论有限的 VC 界，并给出对 Subspace Junta 的 naive VC 界的反例。

实验结果

研究问题

RQ1在训练与测试分布在不相关特征上不同的情况下，稀疏性在何条件下能够实现可靠的 OOD 泛化？
RQ2基底鲁棒稀疏性（子空间 junta）如何把稀疏假设扩展到与现实世界的基底变换表示对齐？
RQ3在 OOD 情况下，k-稀疏假设与 k-子空间 junta 的 PAC 风格样本复杂度与 VC 维含义为何？
RQ4相关分布在相关特征/子空间上的重叠在保证泛化到 D′ 方面起到何种作用？

主要发现

一个 PAC 风格的界：在 m = Õ((d + k log n)/ε) 个样本后，每个与训练数据一致的 k-稀疏假设在对所有在相关特征上匹配的 D′ 上的误差至多为 ε（定理 3-4）。
通过子空间 junta 的基底鲁棒泛化结果：若 f 与 h 仅依赖于共享子空间 A，投影到 A 的分布匹配即可实现传递（定理 5-6）。
VC 维方面的考虑：推导出稀疏假设并集的 VCdim 上界的推论，以及半代数类的有限 VC 界的讨论。
识别出一个反例，表明某些 G/H 参数化可能使 Subspace Junta 的 VC 维无限，凸显朴素界限的局限性。
与经典 OOD 工作的联系：提供基于特征/子空间重叠的充分条件，而非仅依赖于不一致基线的界。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。