QUICK REVIEW

[论文解读] Approximating Continuous Functions by ReLU Nets of Minimal Width

Boris Hanin, Mark Sellke|arXiv (Cornell University)|Oct 31, 2017

Neural Networks and Applications参考文献 18被引用 143

一句话总结

本文确定了 ReLU 网络在近似 d_in 变量的任意连续函数所需的最小隐藏层宽度，阈值为 d_in+1，并提供上界为 d_in+d_out。

ABSTRACT

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]^{d_{in}} o\mathbb R^{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.

研究动机与目标

确定 ReLU 网络在近似任意连续函数 f:[0,1]^{d_in} -> R^{d_out} 时的最小隐藏层宽度 w_min(d_in, d_out)
证明若隐藏层宽度至多为 d_in，则无论深度多大，网络表达能力均受限
提供一种达到宽度 d_in+d_out 的上界构造，能够近似任意连续函数
在 width-d_in+d_out 构造中，将 f 的模量连续性 ω_f 与深度联系起来进行量化
建立一个匹配的下界，证明至少需要宽度为 d_in+1 才能实现通用逼近

提出的方法

在不使用跳跃连接的情况下引入 ReLU 网络的阈值 w_min(d_in,d_out)
通过最大-最小串的构造证明上界 w_min(d_in,d_out) ≤ d_in+d_out，能够在紧集上Replicate 任意连续函数
证明任意连续函数 f 在紧集 K 上可以被宽度为 d_in+d_out 的 ReLU 网络以深度依赖于模量连续性 ω_f 近似
利用命题：(i) ReLU 网络宽度为 d_in+d_out 的最大-最小串实现（命题 2）以及（ii）通过受控长度 L = (O(diam(K))/ω_f^{-1}(ε))^{d_in+1} 的最大-最小串来实现逼近（命题 3）
发展几何“角切割”论证（引理 5）以将 ε-近似扩展到更大的域并推导深度界
通过构造具有某种水平集几何形状的函数来实现下界，确保 w_min(d_in,·) ≥ d_in+1，这样就不能用宽度为 d_in 的网络实现通用逼近

实验结果

研究问题

RQ1能够实现对每个连续函数 f:[0,1]^{d_in} → R^{d_out} 的 ε-近似的最小隐藏层宽度 w_min(d_in, d_out) 是多少？
RQ2固定隐藏层宽度时，d_in+1 是否是 ReLU 网络实现通用逼近的尖锐下界？
RQ3对于宽度恰好为 d_in+d_out，是否每个在 [0,1]^{d_in} 上的连续函数都能被 ReLU 网络近似，所需的深度是多少？
RQ4当宽度限制为 d_in+d_out 时，f 的模量连续性如何影响近似所需的深度？
RQ5在没有跳跃连接的设置下，是什么阻碍宽度小于 d_in+1 实现通用逼近？

主要发现

实现通用逼近的最小宽度恰为 d_in+1（下界），并且可以通过宽度达到 d_in+d_out 的上界实现
任意连续函数 f:[0,1]^{d_in}→R^{d_out} 可以被一个隐藏宽度上界为 d_in+d_out 的 ReLU 网络 ε-近似，深度取决于模量连续性 ω_f
上界构造依赖于最大-最小串表示，以及随距离的深度界，深度随 (diam(K)/ω_f^{-1}(ε))^{d_in+1} 进行缩放
一个密度性论证显示宽度为 d_in 不足以实现通用逼近，存在一个正的 η，使得没有宽度为 d_in 的网络能够在 [0,1]^{d_in} 上近似所有连续函数
在他们的设定中不允许跳跃连接；若允许跳跃连接，将通过使宽度为 1 的网络在足够深的情况下近似任意连续函数而使宽度上界变得平凡
通过构造一个具有某些水平集几何形状的函数来建立下界；这使用 ReLU 网络的结构性质以及凸/仿射分段

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。