QUICK REVIEW

[论文解读] Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Roman Novak, Lechao Xiao|arXiv (Cornell University)|Oct 11, 2018

Gaussian Processes and Bayesian Inference被引用 169

一句话总结

本文证明具有大量通道的深度 CNN 收敛到高斯过程，扩展 NN-GP 等价性至带/不带池化的 CNN，并引入蒙特卡罗方法来估计在不可解析情况下的相应核。

ABSTRACT

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance, beneficial in finite channel CNNs trained with stochastic gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.

研究动机与目标

在无限通道极限下，激发对深度 CNN 编码的函数先验的理解。
在广泛条件下建立带池化和不带池化的 CNN 的理论 NN-GP 等价性。
在无限宽度极限下量化池化、权重共享和平移不变性的作用。
为解析形式不可行的体系结构提供实用的方法来计算或逼近 CNN-GP 核。

提出的方法

推导出卷积神经网络中的前激活在给定前一层激活时呈高斯分布，其协方差由仿射映射 A(K) 给出。
证明随着通道数量增加，激活协方差 K^l 通过 C∘A 映射变为确定的。
证明分布收敛于核为 K_infty^L 的 GP，该核通过对 K^0 迭代 (C∘A) 获得。
证明在无限通道极限下，不带池化的 CNN 的 NN-GP 核与局部连接网络的核相同。
描述向量化和投影读出，将 CNN-GP 输出转换为跨类别的 GP 核，包括 K_infty^L 的精确形式。
引入一种蒙特卡罗方法，在解析形式不可处理时估计 NN-GP 核，通过 MC-GP 核估计。

实验结果

研究问题

RQ1具有大量通道的深度 CNN 在无限通道极限下是否对应一个高斯过程？
RQ2带池化的 CNN 在无限通道极限下与不带池化的 CNN 有差异吗？
RQ3如何计算或逼近解析形式过于复杂的 CNN 架构的 GP 核？
RQ4权重共享和平移等变性对贝叶斯无限宽 CNN 核有何影响？
RQ5读出策略（向量化或投影）是否能产生反映常见 CNN 分类器的类别上的 GP 核？

主要发现

具有大量通道的 CNN 收敛到 NN-GP 行为，产生对函数的高斯过程先验。
在不带池化的情况下，CNN-GP 与局部连接网络 GP 相匹配，意味着在该情形下池化与等变性对无限通道没有影响。
平移等变性不会改变无限宽度的贝叶斯处理，将 CNN 与无限通道极限中的 FCN 区分开。
蒙特卡罗方法在解析核不可行时可估计 CNN-GP 核，使具有池化结构的核计算具有实际可行性。
用 SGD 训练的有限宽度 CNN 在某些情形下可能优于对应的 CNN-GP，表明不仅仅是无限宽贝叶斯视角的好处。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。