QUICK REVIEW

[论文解读] Benefits of depth in neural networks

Matus Telgarsky|arXiv (Cornell University)|Feb 14, 2016

Machine Learning and Algorithms参考文献 13被引用 98

一句话总结

本文证明了深度带来的优势：存在尺寸适中的深度网络，在不指数级增长的情况下，浅层网络无法近似，使用半代数门和基于 ReLU 的网络。

ABSTRACT

For any positive integer $k$, there exist neural networks with $Θ(k^3)$ layers, $Θ(1)$ nodes per layer, and $Θ(1)$ distinct parameters which can not be approximated by networks with $\mathcal{O}(k)$ layers unless they are exponentially large --- they must possess $Ω(2^k)$ nodes. This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: $Ω(2^{k^3})$ total tree nodes are required).

研究动机与目标

证明深度网络能够表达浅层网络难以逼近的高度振荡函数。
展示基于振荡计数的方法如何通过半代数门将深度网络与浅层网络区分开来。
将深度层级的见解扩展到卷积网络、和-积网络以及提升决策树等架构。

提出的方法

构造一个需要大量层才能逼近的目标函数，使用 ReLU 门。
定义并分析半代数门，以覆盖常见的激活函数（ReLU、max、分段多项式）。
利用振荡（穿越）计数将深度、函数复杂性与逼近极限联系起来。
证明在层的复合与相加下振荡的界限，从而得到深度分离结果。
采用计数/打包论证，证明在有限规模的浅层网络下难以逼近深层目标。

实验结果

研究问题

RQ1深度神经网络是否可以被证明能够表示浅层网络在非指数级规模下无法逼近的函数？
RQ2振荡增长以及层的复合与相加在跨架构的深度分离中起到了怎样的作用？
RQ3基于深度的分离是否扩展到半代数网络以及如卷积神经网络、和-积网络和提升树等架构？

主要发现

存在具有 2k^3+8 层、3k^3+12 总节点数、以及 4+d 个不同参数的网络，无法被具有 O(k) 层和亚指数规模节点数的网络在 1/64 的 L1 误差内逼近。
更深的网络可以产生远多于浅层的振荡次数，使高度振荡的目标函数对浅层逼近具有抗性。
深度分离也适用于包含半代数门的网络，包括基于 ReLU 的、带最大门控的卷积网络，以及在更强的节点数要求下的提升决策树（总节点数 Ω(2^{k^3})）。
相关结果给出半代数网络的 VC 维界，表明大多数随机标签标记不能被参数受限的深度网络良好逼近。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。