[Paper Review] Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks
The paper proves nearly-tight bounds on the VC-dimension and pseudodimension of deep ReLU (piecewise linear) networks, showing upper and lower bounds that depend on W (weights), L (layers), and U (non-linear units).
We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function. These bounds are tight for almost the entire range of parameters. Letting $W$ be the number of weights and $L$ be the number of layers, we prove that the VC-dimension is $O(W L \log(W))$, and provide examples with VC-dimension $Ω( W L \log(W/L) )$. This improves both the previously known upper bounds and lower bounds. In terms of the number $U$ of non-linear units, we prove a tight bound $Θ(W U)$ on the VC-dimension. All of these bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimensions of these function classes. Combined with previous results, this gives an intriguing range of dependencies of the VC-dimension on depth for networks with different non-linearities: there is no dependence for piecewise-constant, linear dependence for piecewise-linear, and no more than quadratic dependence for general piecewise-polynomial.
Motivation & Objective
- Motivate understanding of generalization through VC-dimension and pseudodimension for deep networks with piecewise linear activations.
- Derive nearly-tight upper and lower bounds on VC-dimension in terms of W and L.
- Relate depth and non-linearity to VC-dimension and pseudodimension across activation types.
- Show sharp bounds and their implications for depth vs. width in neural networks.
Proposed method
- Introduce and analyze piecewise linear networks (including ReLU) to study VC-dimension and pseudodimension.
- Prove a new lower bound using an enhanced bit extraction construction yielding VC-dimension ≥ WL log(W/L)/C (Theorem 3).
- Prove a new upper bound for piecewise polynomial activations using growth-function and semi-algebraic set techniques (Theorem 6).
- Relate VC-dimension to effective depth and parameter distribution via the quantity 1Wbar (Theorem 6).
- Establish an upper bound in terms of W and U for piecewise polynomial activations (Theorem 8).
- Show implications for depth: piecewise-constant, piecewise-linear, and general piecewise-polynomial activation functions.
Experimental results
Research questions
- RQ1What are the tight (up to constants) VC-dimension and pseudodimension bounds for deep networks with piecewise linear activations?
- RQ2How do the number of parameters W, layers L, and nonlinear units U influence the VC-dimension and pseudodimension?
- RQ3Does depth have a different impact on VC-dimension for piecewise-constant, piecewise-linear, and piecewise-polynomial activations?
- RQ4Can upper bounds be unified and tightened across activation families, including ReLU?
Key findings
- VC-dimension is O(WL log(W)) for piecewise linear networks under the stated architecture.
- There exist networks with VC-dimension at least Ω(WL log(W/L)), improving previous Ω(WL) and Ω(W log W) bounds.
- In terms of the number of nonlinear units U, the VC-dimension is Θ(WU).
- For piecewise-polynomial activations with degree d and p pieces, the VC-dimension bounds are improved to O(WU) in general and Ω(WL log(W/L)) in the piecewise linear case.
- There is no depth dependence for piecewise-constant activations and at most quadratic depth dependence for general piecewise-polynomial activations; piecewise-linear cases exhibit linear depth interaction in the bounds.
- An upper bound result (Theorem 6) shows the VC-dimension scaling as O(WL log W) for d = 1, and clarifies dependence on effective depth and activation structure.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.