[Paper Review] Complexity of Linear Regions in Deep Networks
The paper develops a mathematical framework to count linear regions in piecewise linear networks (like ReLU), showing that at initialization the average number of regions along a 1D subspace scales linearly with the total number of neurons and the average distance to a region boundary scales as 1/number of neurons; experiments indicate training does not reach exponential region counts.
It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.
Motivation & Objective
- Motivate a rigorous measure of expressivity for piecewise linear networks via linear regions and region boundaries.
- Develop mathematical tools to count linear regions and quantify boundary volume at initialization and during training.
- Show that average region counts along 1D lines scale with total neuron count, not depth, and bound distance to region boundaries as 1/neurons.
- Empirically verify theoretical results on MNIST and observe stability of region counts through training.
Proposed method
- Model networks with piecewise linear activations and partition input space into linear regions.
- Define boundary set B_N where gradient is discontinuous and decompose into k-codimension components B_N,k.
- Prove that the expected (n_in - k)-dimensional volume of B_N,k inside a bounded K scales with number of neurons (Theorem 3).
- Derive corollaries giving explicit bounds on region count along 1D lines and distance to region boundaries (Corollaries 4-5).
- Perform experiments with He-normal initialization and MNIST data to count regions along lines and measure distance to boundaries.
- Use co-area and Jacobian-based calculations to relate region boundaries to neuron gradients and biases.
Experimental results
Research questions
- RQ1How many linear regions does a ReLU network have on average at initialization along a 1D input line?
- RQ2How does the boundary volume between linear regions scale with network size and depth?
- RQ3What is the typical distance from a random input to the nearest region boundary, and how does it scale with the number of neurons?
- RQ4How do these regional properties evolve during training on real data (e.g., MNIST)?
Key findings
- For 1D inputs, the average number of linear regions along a line is proportional to the number of neurons (linear in neurons, independent of depth).
- The average distance to the nearest region boundary at initialization scales as a constant divided by the number of neurons.
- The boundary volume density along bounded input regions is proportional to the number of neurons (times the nonlinearity's breakpoints).
- Experiments show the number of regions and the distance to boundaries stay roughly constant during training and remain far from exponential maxima.
- Empirical visualizations on MNIST confirm regions expand and then contract during training, with region counts remaining near the initialization scale.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.