QUICK REVIEW

[Paper Review] An exploration of parameter redundancy in deep networks with circulant projections

Yu Cheng, Felix X. Yu|arXiv (Cornell University)|Feb 11, 2015

Advanced Neural Network Applications38 references48 citations

TL;DR

This paper proposes replacing unstructured fully-connected layers in deep neural networks with circulant projections to drastically reduce memory and computation costs. By leveraging the Fast Fourier Transform (FFT), the method reduces time complexity from O(d²) to O(d log d) and space complexity from O(d²) to O(d), achieving near-sota performance with minimal accuracy drop on standard datasets while enabling faster training and scalability to larger models.

ABSTRACT

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection. The circulant structure substantially reduces memory footprint and enables the use of the Fast Fourier Transform to speed up the computation. Considering a fully-connected neural network layer with d input nodes, and d output nodes, this method improves the time complexity from O(d^2) to O(dlogd) and space complexity from O(d^2) to O(d). The space savings are particularly important for modern deep convolutional neural network architectures, where fully-connected layers typically contain more than 90% of the network parameters. We further show that the gradient computation and optimization of the circulant projections can be performed very efficiently. Our experiments on three standard datasets show that the proposed approach achieves this significant gain in storage and efficiency with minimal increase in error rate compared to neural networks with unstructured projections.

Motivation & Objective

To address the high memory and computational cost of fully-connected layers in deep neural networks, which often account for over 90% of parameters in modern architectures.
To explore parameter redundancy in fully-connected layers and exploit structural constraints to reduce model size without significant performance loss.
To develop an efficient optimization method for training neural networks with circulant projection matrices while preserving model capacity.
To demonstrate that circulant projections can achieve competitive accuracy with significantly reduced storage and inference time.
To enable the training of deeper and larger fully-connected networks under fixed computational and memory budgets.

Proposed method

Replace standard dense weight matrices with circulant matrices, which are defined by a single vector and cyclic shifts, reducing parameter count from O(d²) to O(d).
Use the Fast Fourier Transform (FFT) to compute matrix-vector products in O(d log d) time instead of O(d²), enabling faster inference and training.
Introduce a sign-flipping matrix D before the circulant projection to improve representational capacity and prevent collapse to low-rank projections.
Formulate the optimization of circulant matrices using backpropagation, where gradients are efficiently computed via FFT-based operations.
Apply the circulant projection in fully-connected layers, particularly in the final layers of CNNs, to replace standard dense layers.
Use randomized initialization of circulant matrices and fine-tune them end-to-end with standard backpropagation, maintaining compatibility with standard deep learning frameworks.

Experimental results

Research questions

RQ1Can circulant projections effectively replace unstructured dense layers in deep networks while maintaining competitive accuracy?
RQ2To what extent can circulant projections reduce memory and computation costs in fully-connected layers without degrading model performance?
RQ3How does the inclusion of the sign-flipping matrix D affect the representational capacity and generalization of circulant networks?
RQ4Can circulant networks be trained efficiently and converge faster than standard networks with comparable parameter counts?
RQ5To what extent can circulant networks scale to deeper or larger architectures under fixed resource constraints?

Key findings

On MNIST, the circulant network achieved a test error rate of 0.95%, only 0.5% higher than the standard network, despite using 4000x less memory.
On CIFAR-10, the circulant model achieved 16.71% test error, only 1.5% higher than the baseline, with a 4000x reduction in parameter count.
On ImageNet, the circulant model achieved 25.5% top-1 error, comparable to the 25.3% of the standard network, while reducing memory usage by over 99%.
The inclusion of the sign-flipping matrix D was critical: removing it increased error rates by 1.5% on MNIST and 4.6% on CIFAR-10.
The circulant model trained up to 10x deeper than standard networks under the same computational budget, demonstrating scalability.
The method reduced training time per epoch by up to 30% on fully-connected networks due to FFT acceleration, with minimal impact on convergence speed.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.