QUICK REVIEW

[Paper Review] Complexity of Linear Regions in Deep Networks

Boris Hanin, David Rolnick|arXiv (Cornell University)|Jan 25, 2019

Neural Networks and Applications54 citations

TL;DR

The paper develops a mathematical framework to count linear regions in piecewise linear networks (like ReLU), showing that at initialization the average number of regions along a 1D subspace scales linearly with the total number of neurons and the average distance to a region boundary scales as 1/number of neurons; experiments indicate training does not reach exponential region counts.

ABSTRACT

It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.

Motivation & Objective

Motivate a rigorous measure of expressivity for piecewise linear networks via linear regions and region boundaries.
Develop mathematical tools to count linear regions and quantify boundary volume at initialization and during training.
Show that average region counts along 1D lines scale with total neuron count, not depth, and bound distance to region boundaries as 1/neurons.
Empirically verify theoretical results on MNIST and observe stability of region counts through training.

Proposed method

Model networks with piecewise linear activations and partition input space into linear regions.
Define boundary set B_N where gradient is discontinuous and decompose into k-codimension components B_N,k.
Prove that the expected (n_in - k)-dimensional volume of B_N,k inside a bounded K scales with number of neurons (Theorem 3).
Derive corollaries giving explicit bounds on region count along 1D lines and distance to region boundaries (Corollaries 4-5).
Perform experiments with He-normal initialization and MNIST data to count regions along lines and measure distance to boundaries.
Use co-area and Jacobian-based calculations to relate region boundaries to neuron gradients and biases.

Experimental results

Research questions

RQ1How many linear regions does a ReLU network have on average at initialization along a 1D input line?
RQ2How does the boundary volume between linear regions scale with network size and depth?
RQ3What is the typical distance from a random input to the nearest region boundary, and how does it scale with the number of neurons?
RQ4How do these regional properties evolve during training on real data (e.g., MNIST)?

Key findings

For 1D inputs, the average number of linear regions along a line is proportional to the number of neurons (linear in neurons, independent of depth).
The average distance to the nearest region boundary at initialization scales as a constant divided by the number of neurons.
The boundary volume density along bounded input regions is proportional to the number of neurons (times the nonlinearity's breakpoints).
Experiments show the number of regions and the distance to boundaries stay roughly constant during training and remain far from exponential maxima.
Empirical visualizations on MNIST confirm regions expand and then contract during training, with region counts remaining near the initialization scale.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.