QUICK REVIEW

[论文解读] On the number of response regions of deep feed forward networks with piece-wise linear activations

Razvan Pascanu, Guido Montúfar|arXiv (Cornell University)|Dec 20, 2013

Advanced Memory and Neural Computing参考文献 17被引用 126

一句话总结

本文通过统计输入空间中定义的线性区域数量（即分段线性函数的分段），分析了具有ReLU激活函数的深层前馈网络的表征能力。结果表明，与参数数量相同的浅层网络相比，深层网络在深度增加时能实现指数级更多的线性区域，从而在通过分层组合建模复杂函数方面展现出根本性优势。

ABSTRACT

This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $Ω(\left\lfloor {n}/{n_0} ight floor^{k-1}n^{n_0})$. The number $\left\lfloor{n}/{n_0} ight floor^{k-1}$ grows faster than $k^{n_0}$ when $n$ tends to infinity or when $k$ tends to infinity and $n \geq 2n_0$. Additionally, even when $k$ is small, if we restrict $n$ to be $2n_0$, we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.

研究动机与目标

理解为何深层神经网络能比浅层网络更高效地表征复杂函数。
量化具有分段线性激活函数（如ReLU）的深层前馈网络的表征容量。
在固定参数预算下，比较深层与浅层架构的线性区域数量。
基于超平面排列构建几何框架，分析深层网络的复杂性。

提出的方法

将深层ReLU网络建模为分段线性函数的复合，以分析其响应区域。
利用计算几何方法，统计输入空间中超平面排列形成的线性区域数量。
通过组合求和推导浅层与深层架构的线性区域数量的上下界。
应用渐近分析（大O与大Ω符号）比较线性区域数量随深度与宽度增长的速率。
将线性区域数量与参数数量关联，评估表征效率。
证明深层模型的区域数量随深度呈指数增长，而浅层模型则呈多项式增长。

实验结果

研究问题

RQ1在参数数量相同的情况下，深层ReLU网络的线性区域数量相对于浅层网络，如何随深度与宽度变化？
RQ2在参数数量固定时，深层网络能否实现指数级多于浅层网络的线性区域？
RQ3在输入维度固定的情况下，ReLU网络的线性区域数量与其深度之间存在何种关系？
RQ4在深层与浅层架构中，线性区域数量如何随参数数量增长？
RQ5深层网络中的分层组合在多大程度上提升了表征容量，超越了浅层网络所能达到的水平？

主要发现

对于具有 $ kn $ 个隐层单元和 $ n_0 $ 个输入的浅层网络，当 $ n_0 = O(1) $ 时，线性区域数量为 $ O(k^{n_0}n^{n_0}) $。
对于每层有 $ n $ 个隐层单元、共 $ k $ 层的深层网络，当 $ n_0 = O(1) $ 时，线性区域数量为 $ \Omega\left(\left\lfloor\frac{n}{n_0}\right\rfloor^{k-1}n^{n_0}\right) $。
当 $ n \geq 2n_0 $ 时，随着 $ n \to \infty $ 或 $ k \to \infty $，深层模型的区域数量增长速度快于浅层模型。
当 $ n = 2n_0 $ 时，即使 $ k $ 较小，深层模型的线性区域数量也显著多于浅层模型。
线性区域数量与参数数量的比值随深度 $ k $ 呈指数增长，表明深层模型具有更高的表征效率。
深层模型的参数数量为 $ O(kn^2) $，而浅层模型为 $ O(kn) $，表明深层模型在单位参数下能实现更多的区域。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。