QUICK REVIEW

[论文解读] Comparing Rewinding and Fine-tuning in Neural Network Pruning

Alex Renda, Jonathan Frankle|arXiv (Cornell University)|Mar 5, 2020

Neural Networks and Applications被引用 179

一句话总结

该论文在修剪后比较三种再训练技术——微调、weight rewinding 与 learning rate rewinding——并表明 rewinding 方法在网络与数据集上优于微调，其中 learning rate rewinding 常表现最好，并使一种简单、网络无关的剪枝算法成为可能。

ABSTRACT

Many neural network pruning algorithms proceed in three steps: train the network to completion, remove unwanted structure to compress the network, and retrain the remaining structure to recover lost accuracy. The standard retraining technique, fine-tuning, trains the unpruned weights from their final trained values using a small fixed learning rate. In this paper, we compare fine-tuning to alternative retraining techniques. Weight rewinding (as proposed by Frankle et al., (2019)), rewinds unpruned weights to their values from earlier in training and retrains them from there using the original training schedule. Learning rate rewinding (which we propose) trains the unpruned weights from their final values using the same learning rate schedule as weight rewinding. Both rewinding techniques outperform fine-tuning, forming the basis of a network-agnostic pruning algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques.

研究动机与目标

Motivate pruning methods that remove parameters and retrain to recover accuracy after pruning.
Evaluate three retraining techniques: fine-tuning, weight rewinding, and learning rate rewinding.
Determine which retraining approach yields the best accuracy given compression and search cost.
Propose a simple, network-agnostic pruning algorithm achieving state-of-the-art tradeoffs between accuracy and parameter count.

提出的方法

Define retraining techniques: fine-tuning uses final weights with a fixed small learning rate.
Define weight rewinding to a previous training point and rewind the learning rate schedule accordingly.
Define learning rate rewinding to reuse the learning rate schedule from the last t epochs while keeping final weights.
Use magnitude-based pruning (global for unstructured, per-layer L1 for structured) to obtain sparsity.
Evaluate one-shot and iterative pruning across multiple networks and datasets (CIFAR-10, ImageNet, WMT16 EN-DE).
Compare accuracy, parameter-efficiency, and search cost across retraining methods.

实验结果

研究问题

RQ1Do weight rewinding and learning rate rewinding outperform fine-tuning as retraining methods after pruning?
RQ2How do the retraining methods compare in terms of accuracy and parameter-efficiency across networks and datasets?
RQ3Can a network-agnostic pruning algorithm based on rewinding achieve state-of-the-art tradeoffs without extensive hyperparameter search?
RQ4What is the impact of iterative versus one-shot pruning on the effectiveness of rewinding techniques?

主要发现

Weight rewinding outperforms fine-tuning across networks and datasets.
Learning rate rewinding matches or surpasses weight rewinding in all scenarios.
Learning rate rewinding achieves state-of-the-art Accuracy versus Parameter-Efficiency tradeoffs with iterative unstructured pruning.
The proposed pruning algorithm with learning rate rewinding matches state-of-the-art tradeoffs without per-compression-ratio hyperparameters.
Weight rewinding nearly matches the state-of-the-art results, indicating lottery-ticket subnetworks are competitive with general pruned networks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。