QUICK REVIEW
[论文解读] FreezeOut: Accelerate Training by Progressively Freezing Layers
Andrew Brock, Theodore Lim|arXiv (Cornell University)|Jun 15, 2017
Advanced Neural Network Applications参考文献 11被引用 75
一句话总结
FreezeOut 通过逐步冻结隐藏层并将其从反向传播中排除来加速神经网络训练,在某些架构上实现高达 20% 的实时代时加速,且精度损失极小。
ABSTRACT
The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut
研究动机与目标
- Motivate reducing training time by leveraging layers that converge early and have fewer parameters.
- Propose a layer-wise learning-rate schedule that freezes layers progressively during training.
- Evaluate the method across DenseNets, Wide ResNets, and VGG to identify where it yields speedups and where it does not.
- Provide practical defaults and guidelines for applying FreezeOut in common CNN architectures.
提出的方法
- 在逐层调度中应用无重启的余弦退火学习率。
- 在时间 t0 冻结第一层,并在后来时间 ti 逐步冻结后续层。
- 可选地对每一层缩放初始学习率,并将 ti 的值立方以偏向后续层。
- Compute per-layer learning-rate schedules: ai(t)=0.5*ai(0)*(1+cos(pi*t/ti)).
- 一旦某层的学习率跌落到零,即从反向传播中排除该层,从而降低每次迭代的成本。
- 提供四种调度变体:ti 的线性与立方进展、未缩放与缩放学习率,以及一个推荐的默认配置。
实验结果
研究问题
- RQ1 progressively freezing layers reduce training time without prohibitive loss in accuracy across common CNN architectures?
- RQ2Which scheduling variant (linear/cubic, scaled/unscaled) provides the best trade-off between speedup and accuracy?
- RQ3How does FreezeOut perform on architectures with and without skip connections (DenseNets/ResNets vs VGG)?
主要发现
- Up to 20% wall-clock speedup during training across tested networks.
- DenseNets show up to ~3% increase in test error with FreezeOut under some configurations.
- ResNets achieve about 20% speedup with no loss in accuracy in some settings.
- VGG networks show no improvement from FreezeOut.
- A cubic scheduling with learning-rate scaling is recommended for maximizing speed within ~3% accuracy loss.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。