Skip to main content
QUICK REVIEW

[论文解读] FreezeOut: Accelerate Training by Progressively Freezing Layers

Andrew Brock, Theodore Lim|arXiv (Cornell University)|Jun 15, 2017
Advanced Neural Network Applications参考文献 11被引用 75
一句话总结

FreezeOut 通过逐步冻结隐藏层并将其从反向传播中排除来加速神经网络训练,在某些架构上实现高达 20% 的实时代时加速,且精度损失极小。

ABSTRACT

The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut

研究动机与目标

  • Motivate reducing training time by leveraging layers that converge early and have fewer parameters.
  • Propose a layer-wise learning-rate schedule that freezes layers progressively during training.
  • Evaluate the method across DenseNets, Wide ResNets, and VGG to identify where it yields speedups and where it does not.
  • Provide practical defaults and guidelines for applying FreezeOut in common CNN architectures.

提出的方法

  • 在逐层调度中应用无重启的余弦退火学习率。
  • 在时间 t0 冻结第一层,并在后来时间 ti 逐步冻结后续层。
  • 可选地对每一层缩放初始学习率,并将 ti 的值立方以偏向后续层。
  • Compute per-layer learning-rate schedules: ai(t)=0.5*ai(0)*(1+cos(pi*t/ti)).
  • 一旦某层的学习率跌落到零,即从反向传播中排除该层,从而降低每次迭代的成本。
  • 提供四种调度变体:ti 的线性与立方进展、未缩放与缩放学习率,以及一个推荐的默认配置。

实验结果

研究问题

  • RQ1 progressively freezing layers reduce training time without prohibitive loss in accuracy across common CNN architectures?
  • RQ2Which scheduling variant (linear/cubic, scaled/unscaled) provides the best trade-off between speedup and accuracy?
  • RQ3How does FreezeOut perform on architectures with and without skip connections (DenseNets/ResNets vs VGG)?

主要发现

  • Up to 20% wall-clock speedup during training across tested networks.
  • DenseNets show up to ~3% increase in test error with FreezeOut under some configurations.
  • ResNets achieve about 20% speedup with no loss in accuracy in some settings.
  • VGG networks show no improvement from FreezeOut.
  • A cubic scheduling with learning-rate scaling is recommended for maximizing speed within ~3% accuracy loss.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。