QUICK REVIEW

[论文解读] Learning Sparse Networks Using Targeted Dropout

Aidan N. Gomez, Chunshun Zhang|arXiv (Cornell University)|May 31, 2019

Machine Learning and ELM参考文献 42被引用 76

一句话总结

定向 dropout 通过在训练过程中有选择地丢弃幅度最低的权重来使网络对剪枝具有鲁棒性，从而在各种架构和数据集上实现非常高的稀疏度，同时几乎不损失准确率。

ABSTRACT

Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for training a neural network so that it is robust to subsequent pruning. Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights. The resulting network is robust to post hoc pruning of weights or units that frequently occur in the dropped sets. The method improves upon more complicated sparsifying regularisers while being simple to implement and easy to tune.

研究动机与目标

Motivate sparsification of neural networks to reduce computation and storage without large accuracy penalties.
Propose a training procedure that makes networks robust to post hoc pruning by targeting dropout to unimportant weights.
Show that targeted dropout improves sparsity-accuracy trade-offs compared to standard regularizers and pruning heuristics.
Demonstrate the method across multiple architectures (ResNet, Wide ResNet, Transformer) and datasets (CIFAR-10, ImageNet, WMT EN-DE).
Provide practical guidance and comparisons to existing sparsification techniques for practitioners.

提出的方法

Rank weights or units by a fast importance measure (e.g., magnitude).
Define a targeting proportion γ and a drop rate α to select the bottom γ|θ| weights for dropout and drop them with probability α.
Apply dropout to the selected unimportant elements during each gradient computation step to encourage robustness to pruning.
Train the network so that the important subnetwork (top-k weights by magnitude) becomes less dependent on the unimportant subnetwork.
Compare targeted dropout to L1 and L0 regularization, variational dropout, and Smallify, using greedy magnitude-based pruning after training.
Evaluate on architectures such as ResNet, Wide ResNet, and Transformer across CIFAR-10, ImageNet, and WMT EN-DE.

实验结果

研究问题

RQ1Does targeted dropout improve robustness of networks to post hoc pruning compared to standard dropout and sparsity-inducing regularizers?
RQ2Can targeted dropout achieve high sparsity (e.g., 90–99%) with minimal loss in task performance across diverse architectures and datasets?
RQ3How does the dependency between the important and unimportant subnetworks change under targeted dropout, and how does this relate to pruning outcomes?
RQ4What are the practical benefits and limitations of targeted dropout relative to existing pruning approaches (L1, L0, variational dropout, Smallify) in real-world models?
RQ5Is ramping or fixed-pattern variants of targeted dropout effective across different architectures?

主要发现

Targeted dropout yields strong sparsification while preserving accuracy: e.g., 99% sparsity on ResNet-32 for less than 4% drop in CIFAR-10 accuracy.
Networks trained with targeted dropout show much reduced dependence of the important subnetwork on the unimportant one, leading to smaller ΔE upon pruning.
Compared to standard dropout, L1, and L0 regularization, targeted dropout achieves better sparsity-accuracy trade-offs across multiple architectures (ResNet, Wide ResNet, Transformer) and datasets.
In Transformer experiments, targeted dropout improves BLEU scores at high sparsity (e.g., up to +15 BLEU at 70% sparsity in EN-DE with targeted dropout).
Rampings variants of targeted dropout (ramping TD) can reach very high sparsity (around 99%) with competitive accuracy, sometimes outperforming alternative sparse training methods such as Smallify at certain regimes.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。