QUICK REVIEW

[论文解读] Mixing Complexity and its Applications to Neural Networks

Michal Moshkovitz, Naftali Tishby|arXiv (Cornell University)|Mar 2, 2017

Machine Learning and Algorithms参考文献 26被引用 8

一句话总结

本文引入混合复杂度（mixing complexity）作为衡量在内存约束下假设类可学习性的新指标，特别针对神经网络。研究证明，混合复杂度高的类（MC(H) = Ω(√|H|)）无法被有界内存算法学习，解释了为何大多数类在神经网络中不可学习。该框架还表明，具有r-充分划分的自然、结构化类具有较低的混合复杂度，因此可学习，从而调和了理论限制与实际应用中的成功表现。

ABSTRACT

A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [Raz, 2016; Kol et al., 2017; Raz, 2017; Moshkovitz and Moshkovitz, 2018; Beame et al., 2018; Garg et al., 2018]. For example, any algorithm for learning parities of size n requires either a memory of size Omega(n^{2}) or an exponential number of samples [Raz, 2016]. All these works modeled the learner as a one-pass branching program, allowing only one pass over the stream of samples. In this work, we prove the first memory-samples lower bounds (with a super-linear lower bound on the memory size and super-polynomial lower bound on the number of samples) when the learner is allowed two passes over the stream of samples. For example, we prove that any two-pass algorithm for learning parities of size n requires either a memory of size Omega(n^{1.5}) or at least 2^{Omega(sqrt{n})} samples. More generally, a matrix M: A x X - > {-1,1} corresponds to the following learning problem: An unknown element x in X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a_1, b_1), (a_2, b_2) ..., where for every i, a_i in A is chosen uniformly at random and b_i = M(a_i,x). Assume that k,l, r are such that any submatrix of M of at least 2^{-k} * |A| rows and at least 2^{-l} * |X| columns, has a bias of at most 2^{-r}. We show that any two-pass learning algorithm for the learning problem corresponding to M requires either a memory of size at least Omega (k * min{k,sqrt{l}}), or at least 2^{Omega(min{k,sqrt{l},r})} samples.

研究动机与目标

解决大多数假设类在理论上不可学习与神经网络在实践中成功之间的差距。
通过r-充分划分的形式化定义自然数据类中的'结构'概念。
证明混合复杂度相较于VC维，是解释神经网络泛化能力的更优复杂度度量。
证明混合复杂度在小标签扰动下具有鲁棒性。
调和神经网络在有界内存下的理论不可学习性与实际成功表现之间的矛盾。

提出的方法

引入混合复杂度（MC(H)）作为衡量假设类H与随机类接近程度的指标，基于二分图表示中边的分布。
利用d-混合性质建模边在所有顶点对间近乎均匀分布的类。
应用图论工具，包括边集中度界限（引理10），分析假设在样本上的分布。
证明d-混合类的VC维为Ω(log |H|)，表明其在无内存约束下最难学习。
证明混合复杂度在标签扰动下的鲁棒性：改变最多b个标签，混合复杂度最多增加√b。
利用壳分解与假设划分方法，表明混合类具有较大的壳大小，进一步强化其困难性。

实验结果

研究问题

RQ1为何神经网络在理论上受内存约束时存在可学习性限制，却在实践中仍能成功？
RQ2现实世界数据类的何种结构性质使其尽管复杂度高，仍可被神经网络学习？
RQ3混合复杂度能否作为解释神经网络泛化能力的更优复杂度度量，优于VC维？
RQ4混合复杂度在标签或数据的微小变化下如何表现？
RQ5具有r-充分划分的假设类（即结构化类）是否本质上复杂度更低，从而在内存约束下可学习？

主要发现

混合复杂度MC(H) = Ω(√|H|)的类无法被有界内存算法学习，意味着在内存约束下大多数假设类不可学习。
高混合复杂度的假设类具有Ω(log |H|)的VC维，达到最大可能值，确认其在无内存约束下最难学习。
自然、结构化的类——通过r-充分划分形式化——具有较低的混合复杂度，因此在有界内存下可能可学习。
混合复杂度具有鲁棒性：最多改变b个样本的标签，混合复杂度最多增加√b。
混合复杂度可区分自然图像数据与随机标签，解释了Zhang等人（2017）观察到的泛化差距。
该框架通过表明现实世界数据类因底层结构而并非混合类，调和了神经网络的实际成功与理论不可学习性之间的矛盾。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。