QUICK REVIEW

[論文レビュー] Optimal approximation of continuous functions by very deep ReLU networks

Dmitry Yarotsky|arXiv (Cornell University)|Feb 10, 2018

Neural Networks and Applications参考文献 20被引用数 121

ひとこと要約

本論文は [0,1]^ν 上の連続関数を非常に深い ReLU ネットワークで近似する際の最適な近似速度を特徴づけ、2つの異なる位相と深さ依存の速度を特定します。

ABSTRACT

We consider approximations of general continuous functions on finite-dimensional cubes by general deep ReLU neural networks and study the approximation rates with respect to the modulus of continuity of the function and the total number of weights $W$ in the network. We establish the complete phase diagram of feasible approximation rates and show that it includes two distinct phases. One phase corresponds to slower approximations that can be achieved with constant-depth networks and continuous weight assignments. The other phase provides faster approximations at the cost of depths necessarily growing as a power law $L\sim W^α, 0

研究の動機と目的

Investigate how well general continuous functions on [0,1]^ν can be approximated by deep ReLU networks as a function of the number of weights W and the modulus of continuity ω_f.
Determine whether faster convergence rates require greater depth or discontinuous weight assignments.
Establish a complete phase diagram of feasible approximation rates and identify the conditions for each phase.
Provide architectures that achieve the optimal rates and prove their optimality up to logarithmic factors.

提案手法

Define the modulus of continuity ω_f and study approximation in the uniform norm ||f - f~||_∞ with respect to W and p in ||f - f~|| ≤ a ω_f(c W^{-p}).
Show that p ≤ 2/ν is feasible and that p > 2/ν is infeasible using VC-dimension bounds.
Prove a phase transition: p = 1/ν attainable with continuous weight assignments and fixed-depth architectures; p ∈ (1/ν, 2/ν] requires deep networks with discontinuous weights.
Construct two-scale architectures: an initial interpolation with O(W^{1/ν}) knots and a second, discrete approximation encoded via special weights to achieve p in (1/ν, 2/ν].
Demonstrate that p = 2/ν is achievable with narrow fully-connected architectures of fixed width and depth L growing with W, and that p ∈ (1/ν, 2/ν) is achievable with stacked architectures.
Utilize the bit-extraction technique and the two-scales expansion to realize the second stage of approximation.

実験結果

リサーチクエスチョン

RQ1What is the optimal rate at which continuous functions on [0,1]^ν can be approximated by ReLU networks as a function of the number of weights W?
RQ2How do the depth L and weight continuity affect achievable rates, and is there a phase transition in the rate with respect to p?
RQ3What network architectures realize the optimal rates, and what are the necessary depth and width requirements in different regimes?
RQ4Is there a fundamental bound that separates feasible and infeasible rates, and how do VC-dimension arguments constrain these rates?

主な発見

There exists a complete phase diagram of approximation rates parameterized by the modulus of continuity and the weight count W.
The fastest achievable rate is O(ω_f(O(W^{-2/ν}))), requiring depths growing as a power law L ~ W^{α} with 0 < α ≤ 1 and discontinuous weight assignments.
A slower, yet feasible rate O(ω_f(O(W^{-1/ν}))) can be achieved with constant-depth networks and continuous weight assignments.
Two distinct phases are identified: a slow phase with constant depth and continuous weights, and a fast phase with deep networks and discontinuous weights.
In particular, constant-width fully-connected networks of depth L ~ W achieve the fastest rate and cannot attain this rate with shallower networks.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。