QUICK REVIEW

[论文解读] Deep Neural Networks as Gaussian Processes

Jaehoon Lee, Yasaman Bahri|arXiv (Cornell University)|Nov 1, 2017

Gaussian Processes and Bayesian Inference参考文献 16被引用 335

一句话总结

本论文证明无限宽深度神经网络与高斯过程之间的精确等价，并提供一种可扩展的方法来计算相应的GP核，以便在回归任务如MNIST和CIFAR-10上进行贝叶斯推断。

ABSTRACT

It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

研究动机与目标

在深度、无限宽的神经网络与高斯过程（GPs）之间建立一个精确的对应关系。
推导在各层和非线性函数之间的深度网络GP的递归、确定性的核计算。
证明所得到的GP的贝叶斯推断在标准基准测试上可以达到或超过有限宽度神经网络。
通过将 Neural Network GP (NNGP) 应用于 MNIST 和 CIFAR-10 并与 SGD 训练的网络进行比较来展示实际可行性。
将GP性能与随机网络中的信号传播理论联系起来。

提出的方法

通过使用中心极限定理在逐层无限宽的极限下推导 NNGP 核。
定义递归核更新 K^l(x, x') = σ_b^2 + σ_w^2 F_φ(K^{l-1}(x, x'), K^{l-1}(x, x), K^{l-1}(x', x')), 其中 F_φ 取决于非线性函数 φ。
给出某些 φ 的解析形式（例如 ReLU 的 arccosine 核）以及用于一般 φ 的数值方案来计算 F_φ。
开发一种高效实现，通过预处理和双线性插值方案来计算 K^L 以降低复杂性。
使用推导得到的核进行高斯过程回归，以对回归目标执行精确贝叶斯推断，包括不确定性量化。
将核行为与深度信号传播理论以及来自随机网络的相图联系起来。

实验结果

研究问题

RQ1深度、无限宽的神经网络是否可以被精确表示为具有可计算协方差核的高斯过程？
RQ2深度 L 和非线性函数 φ 的选择如何影响 GP 核以及在图像分类任务上的预测性能？
RQ3使用 NNGP 核的 GP 后验预测是否在如 MNIST 和 CIFAR-10 之类的数据集上与用 SGD 训练的有限宽度神经网络相比具有竞争力或更优？
RQ4GP 不确定性与测试数据上的实际预测误差之间如何相关？
RQ5NNGP 的性能是否与最近关于随机神经网络中信号传播的理论有关？

主要发现

在可比设置下，NNGP 在 MNIST 和 CIFAR-10 上常常优于用 SGD 训练的有限宽度网络。
随着网络宽度的增加，训练的 NN 在性能上开始与 NNGP 接近，表明 SGD 训练的网络与宽域中的贝叶斯推断之间存在密切联系。
GP 不确定性估计与测试数据上的实际预测误差高度相关。
性能高峰与不同非线性函数下深度信号传播相图（有序/混沌相）预测的区域一致。
GP 为预测提供明确、原理性的不确定性度量，这对于标准神经网络是具有挑战性的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。