QUICK REVIEW

[论文解读] Learning with invariances in random features and kernel models

Mei Song, Theodor Misiakiewicz|arXiv (Cornell University)|Feb 25, 2021

Stochastic Gradient Optimization Techniques参考文献 32被引用 24

一句话总结

本文引入了不变随机特征和不变核方法，以量化机器学习模型中不变性带来的统计优势。结果表明，对于退化参数 $\alpha \leq 1$ 的群，利用不变性可使达到相同测试误差所需的样本量和隐藏单元数减少 $d^{\alpha}$ 倍，从而在球面和超立方体上具有平移不变目标的高维设置中显著提升效率。

ABSTRACT

A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties. With the objective of quantifying the gain achieved by invariant architectures, we introduce two classes of models: invariant random features and invariant kernel methods. The latter includes, as a special case, the neural tangent kernel for convolutional networks with global average pooling. We consider uniform covariates distributions on the sphere and hypercube and a general invariant target function. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, for a class of groups that we call `degeneracy $α$', with $α\leq 1$. We show that exploiting invariance in the architecture saves a $d^α$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures. Finally, we show that output symmetrization of an unstructured kernel estimator does not give a significant statistical improvement; on the other hand, data augmentation with an unstructured kernel estimator is equivalent to an invariant kernel estimator and enjoys the same improvement in statistical efficiency.

研究动机与目标

量化架构不变性（如卷积网络）在高维学习问题中的统计优势。
形式化并分析尊重群对称性（如平移）的不变随机特征和核模型。
在样本量和隐藏单元数随维度多项式增长的高维尺度下，刻画不变模型的测试误差。
将不变方法与非结构化替代方法（如输出对称化和数据增强）进行比较。
证明使用非结构化核进行数据增强在统计上等价于使用不变核估计，二者均实现相同的效率增益。

提出的方法

作者通过群作用 $\mathcal{G}_d \subset \mathrm{O}(d)$ 对特征和核进行对称化，定义了不变随机特征和核模型，确保对循环移位等变换保持不变。
他们在高维球面 $\mathbb{S}^{d-1}$ 和超立方体 $\{-1,1\}^d$ 上分析模型，采用均匀测度，并聚焦于不变目标函数。
分析使用正交多项式展开——球面上的勒让德多项式（Gegenbauer）和超立方体上的超立方体 Gegenbauer 多项式——来表示不变函数和核。
关键理论工具包括用于浓度测度和不变函数空间中谱衰减的超收缩不等式。
论文通过退化参数 $\alpha$ 推导出泛化误差的界，该参数是群相关的，控制不变核中特征值的衰减速率。
结果表明，不变模型可在比非结构化模型少 $d^\alpha$ 倍的样本量和参数下，达到相同的测试误差。

实验结果

研究问题

RQ1在随机特征和核模型中强制实施不变性，可实现多大程度的样本量和模型复杂度降低？
RQ2群结构（特别是退化参数 $\alpha$）在决定不变模型的统计效率增益中起什么作用？
RQ3对非结构化核估计器的输出进行对称化是否能显著提升泛化性能，优于标准核方法？
RQ4使用非结构化核进行数据增强是否在统计性能上等价于使用不变核估计器？
RQ5在球面和超立方体上，不变核的谱特性如何影响高维情况下的泛化误差？

主要发现

对于退化参数 $\alpha \leq 1$ 的群，不变模型实现与非结构化模型相同的测试误差时，所需样本量和隐藏单元数减少 $d^\alpha$ 倍。
使用非结构化核估计器进行数据增强在统计上等价于使用不变核估计器，均实现相同的 $d^\alpha$ 效率增益。
对非结构化核估计器的输出进行对称化无法显著提升泛化性能，相较于标准核方法无明显优势。
具有全局平均池化的卷积神经网络的神经正切核是所提出的不变核方法的一个特例。
通过超收缩不等式和 Gegenbauer 多项式展开，推导出泛化误差的理论界，表明在高维尺度下，不变模型收敛速度更快。
退化参数 $\alpha$ 描述了不变核中谱衰减速率，直接决定了样本复杂度降低的幅度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。