Skip to main content
QUICK REVIEW

[论文解读] The Expressive Power of Neural Networks: A View from the Width

Lu Zhou, Hongming Pu|arXiv (Cornell University)|Sep 8, 2017
Advanced Memory and Neural Computing参考文献 7被引用 125
一句话总结

宽度受限的 ReLU 网络能够在宽度为 n+4 下对任何 Lebesgue-integrable 函数实现通用近似,而宽度为 n 的网络则不行,这揭示了一个基于宽度的相变;本文还给出宽度有效性的多项式下界,并提供了支持性的实验。

ABSTRACT

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-$2$) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-$(n+4)$ ReLU networks, where $n$ is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-$n$ ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.

研究动机与目标

  • 研究 ReLU 网络的宽度如何影响表达能力,超越广泛研究的深度维度。
  • 证明对宽度受限网络的通用近似定理,并确定在 R^n 上 L1 近似的宽度阈值为 (n+4)。
  • 通过为用窄网络近似宽网络建立多项式下界来考察宽度效率。
  • 提供关于实际宽度-深度折衷及其对网络设计的影响的实验证据。

提出的方法

  • 构建一个宽度为-(n+4) 的全连接 ReLU 网络,以任意 L1 误差 ε 近似任何 Lebesgue-integrable 函数。
  • 将目标函数分解为在轴对齐的立方体上的有限和的指示函数,并用基于 ReLU 的模块来近似这些指示函数。
  • 引入分块式网络架构,对跨立方体的近似进行存储与求和,以构建全局近似。
  • 通过构造性网络设计证明一个宽度受限的通用近似定理(定理 1),并与经典的深度受限通用近似进行比较。
  • 通过推导用较窄网络近似宽网络的多项式下界(定理 4)来分析宽度效率,并讨论实验验证。

实验结果

研究问题

  • RQ1Does width-bounded ReLU networks with width n+4 universal-approximate Lebesgue-integrable functions in R^n under L1 distance?
  • RQ2Is there a phase transition in expressiveness when width crosses the threshold n to n+1?
  • RQ3Do there exist wide networks that cannot be approximated by narrow networks unless the latter have polynomially larger size?
  • RQ4Do experimental results support a polynomial (rather than exponential) trade-off between width and required network size for approximation?

主要发现

  • Width-(n+4) ReLU networks can approximate any Lebesgue-integrable function on R^n to arbitrary L1 accuracy.
  • Except for a measure-zero set, functions cannot be approximated by width-n ReLU networks on R^n in L1, indicating a phase transition.
  • There exist width- O(k^2) depth-3 networks that cannot be approximated by width-O(k^1.5) depth-k networks, demonstrating a polynomial width efficiency lower bound (Theorem 4).
  • Experiments show that narrow networks with size modestly larger than the polynomial lower bound can approximate wide shallow networks with high accuracy.
  • Overall, the results provide evidence that depth may be more effective than width for the expressiveness of ReLU networks.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。