Skip to main content
QUICK REVIEW

[Paper Review] A Signal Propagation Perspective for Pruning Neural Networks at Initialization

Namhoon Lee, Thalaiyasingam Ajanthan|arXiv (Cornell University)|Jun 14, 2019
Advanced Neural Network Applications15 references19 citations
TL;DR

This paper introduces a signal propagation perspective to explain and improve pruning neural networks at initialization, showing that layerwise dynamical isometry—where singular values of layer Jacobians are near 1—ensures reliable connection sensitivity measurements. By enforcing orthogonality in pruned networks via a data-free method, the approach significantly improves trainability and generalization, outperforming dense baselines even at extreme sparsity levels.

ABSTRACT

Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pruning starts by training a model and then removing redundant parameters while minimizing the impact on what is learned. Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a saliency criterion called connection sensitivity. However, it remains unclear exactly why pruning an untrained, randomly initialized neural network is effective. In this work, by noting connection sensitivity as a form of gradient, we formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results. Moreover, we analyze the signal propagation properties of the resulting pruned networks and introduce a simple, data-free method to improve their trainability. Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures.

Motivation & Objective

  • To understand why pruning neural networks at initialization is effective despite random weights.
  • To formalize the theoretical conditions under which connection sensitivity—used as a pruning criterion—can be reliably measured.
  • To improve the trainability of pruned sparse networks by analyzing and restoring signal propagation properties.
  • To investigate whether pruning can be performed without supervision using unsupervised surrogate losses.
  • To explore whether neural architecture sculpting—discovering better sparse architectures than standard ones—can be achieved via pruning at initialization.

Proposed method

  • Formalize connection sensitivity as a gradient-based measure and identify its reliability depends on signal propagation fidelity.
  • Introduce layerwise dynamical isometry as a sufficient condition for faithful connection sensitivity, defined by singular values of layer Jacobians being concentrated near 1.
  • Propose a data-free, two-stage method: first prune using connection sensitivity, then enforce layerwise orthogonality to restore signal propagation.
  • Apply the method to various architectures (e.g., ResNet, wide residual networks) and evaluate performance on image classification tasks.
  • Use unsupervised surrogate losses (e.g., autoencoder loss) to compute connection sensitivity without labels, enabling unsupervised pruning.
  • Conduct neural architecture sculpting by pruning larger, arbitrarily-designed networks to match the parameter count of a base dense model, then compare test accuracy.

Experimental results

Research questions

  • RQ1Why is pruning at initialization effective despite random initialization, and what conditions ensure reliable connection sensitivity?
  • RQ2How does signal propagation in pruned networks affect their trainability, and can this be restored after pruning?
  • RQ3Can effective pruning be achieved without supervision using unsupervised surrogate losses?
  • RQ4Can pruning at initialization discover sparse architectures that outperform standard dense models with the same number of parameters?
  • RQ5To what extent does maintaining dynamical isometry during pruning improve generalization and training stability?

Key findings

  • Layerwise dynamical isometry—where all singular values of layer Jacobians are near 1—is a sufficient condition for reliable connection sensitivity measurements during pruning at initialization.
  • Pruning breaks dynamical isometry, degrading signal propagation and reducing trainability in sparse networks, which explains poor performance in unmodified pruning methods.
  • The proposed data-free method to recover layerwise orthogonality significantly improves training performance and generalization of pruned networks.
  • On CIFAR-10, pruned sparse networks with the same number of parameters as a dense ResNet20 base model achieved lower generalization errors (e.g., 4.8% vs. 5.2%), demonstrating superior performance.
  • Unsupervised pruning using surrogate losses (e.g., autoencoder loss) achieved competitive accuracy to supervised pruning, even at extreme sparsity (e.g., 98.4% pruned).
  • Neural architecture sculpting via pruning at initialization discovered sparse architectures that outperformed the original dense ResNet20, especially when starting from wider networks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.