QUICK REVIEW

[论文解读] FreezeOut: Accelerate Training by Progressively Freezing Layers

Andrew Brock, Theodore Lim|arXiv (Cornell University)|Jun 15, 2017

Advanced Neural Network Applications参考文献 11被引用 75

一句话总结

FreezeOut 通过逐步冻结隐藏层并将其从反向传播中排除来加速神经网络训练，在某些架构上实现高达 20% 的实时代时加速，且精度损失极小。

ABSTRACT

The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut

研究动机与目标

Motivate reducing training time by leveraging layers that converge early and have fewer parameters.
Propose a layer-wise learning-rate schedule that freezes layers progressively during training.
Evaluate the method across DenseNets, Wide ResNets, and VGG to identify where it yields speedups and where it does not.
Provide practical defaults and guidelines for applying FreezeOut in common CNN architectures.

提出的方法

在逐层调度中应用无重启的余弦退火学习率。
在时间 t0 冻结第一层，并在后来时间 ti 逐步冻结后续层。
可选地对每一层缩放初始学习率，并将 ti 的值立方以偏向后续层。
Compute per-layer learning-rate schedules: ai(t)=0.5*ai(0)*(1+cos(pi*t/ti)).
一旦某层的学习率跌落到零，即从反向传播中排除该层，从而降低每次迭代的成本。
提供四种调度变体：ti 的线性与立方进展、未缩放与缩放学习率，以及一个推荐的默认配置。

实验结果

研究问题

RQ1 progressively freezing layers reduce training time without prohibitive loss in accuracy across common CNN architectures?
RQ2Which scheduling variant (linear/cubic, scaled/unscaled) provides the best trade-off between speedup and accuracy?
RQ3How does FreezeOut perform on architectures with and without skip connections (DenseNets/ResNets vs VGG)?

主要发现

Up to 20% wall-clock speedup during training across tested networks.
DenseNets show up to ~3% increase in test error with FreezeOut under some configurations.
ResNets achieve about 20% speedup with no loss in accuracy in some settings.
VGG networks show no improvement from FreezeOut.
A cubic scheduling with learning-rate scaling is recommended for maximizing speed within ~3% accuracy loss.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。