QUICK REVIEW

[论文解读] Random deep neural networks are biased towards simple functions

Giacomo De Palma, Bobak T. Kiani|arXiv (Cornell University)|Dec 25, 2018

Machine Learning and Algorithms被引用 30

一句话总结

该论文证明，具有ReLU激活函数的随机深度神经网络倾向于学习简单的位串二值分类器，因为其对输入扰动表现出高度鲁棒性：到最近一个分类不同的输入的平均汉明距离随√(n/(2π ln n))增长，而改变分类所需的随机位翻转平均数量与n呈线性关系。这为深度学习的泛化成功提供了理论基础，表明随机网络倾向于选择简单且稳定的函数。

ABSTRACT

We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2π log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped randomly, the average number of flips required to change the classification grows linearly with n. These results are confirmed by numerical experiments on deep neural networks with two hidden layers, and settle the conjecture stating that random deep neural networks are biased towards simple functions. This conjecture was proposed and numerically explored in [Valle Pérez et al., ICLR 2019] to explain the unreasonably good generalization properties of deep learning algorithms. The probability distribution of the functions generated by random deep neural networks is a good choice for the prior probability distribution in the PAC-Bayesian generalization bounds. Our results constitute a fundamental step forward in the characterization of this distribution, therefore contributing to the understanding of the generalization properties of deep learning algorithms.

研究动机与目标

严格建立随机深度神经网络偏向于简单函数，解决深度学习理论中长期存在的一个猜想。
利用信息论和几何度量（如汉明距离和位翻转鲁棒性）表征随机深度网络的功能简单性。
为将随机深度网络生成的函数分布用作PAC-Bayesian泛化界中的先验分布，提供理论依据。
通过证明随机网络本质上倾向于简单且稳定的函数，解决深度学习虽具有高容量却仍能良好泛化的开放性问题。

提出的方法

通过高斯过程近似和极值统计方法，解析推导到最近一个分类不同的输入位串的期望汉明距离。
将网络输出建模为高斯过程，其协方差函数由ReLU激活函数和随机权重初始化导出。
使用Kullback-Leibler散度和PAC-Bayesian框架，形式化随机网络生成的函数的先验分布。
应用Kolmogorov连续性定理，证明极限高斯过程的连续性，从而支持对零交叉时间的分析。
在两隐藏层网络上进行数值实验，验证理论预测的汉明距离和位翻转鲁棒性。
在经验评估中使用启发式和精确搜索算法，计算最近一个分类不同的输入。

实验结果

研究问题

RQ1具有ReLU激活函数的随机深度神经网络是否如先前研究推测的那样，表现出对简单函数的偏好？
RQ2在随机深度网络中，从一个随机输入位串到最近一个分类不同的输入的典型汉明距离是多少？
RQ3在随机深度网络中，改变分类所需的平均随机位翻转数量如何随输入长度n变化？
RQ4能否使用鲁棒性等几何和概率度量（如对扰动的鲁棒性）来量化随机深度网络的功能简单性？
RQ5随机深度网络生成的函数分布是否适合作为PAC-Bayesian泛化界中的先验？

主要发现

当n较大时，到最近一个分类不同的输入位串的平均汉明距离至少为√(n/(2π ln n))，表明对输入变化具有高度鲁棒性。
改变分类所需的平均随机位翻转数量随n线性增长，模拟结果表明其缩放系数约为n/3，显著高于n/4的下界。
相比之下，均匀随机的二值分类器的平均汉明距离为1，且仅需2次随机位翻转即可改变分类，凸显了复杂性上的根本差异。
理论分析证实，随机深度网络生成的函数本质上是简单且稳定的，支持了其对简单性存在偏置的猜想。
在具有ReLU激活函数的两隐藏层网络上进行的数值实验，验证了理论预测在不同输入大小和网络实例下的有效性。
由于其固有的简单性和鲁棒性，随机深度网络生成的函数的概率分布被证明是PAC-Bayesian泛化界中一个强有力的先验候选。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。