[论文解读] Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks
论文揭示,在训练很早阶段就能通过低成本训练和掩码距离指标识别出 Early-Bird tickets,并引入 EB Train 以实现相等或更高精度下显著的能量节省。
(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training. Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy. Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets, and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 4.7x energy savings while maintaining comparable or even better accuracy, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training. Code available at https://github.com/RICE-EIC/Early-Bird-Tickets.
研究动机与目标
- 证明在不同模型和数据集的训练过程中,早期出现的 Early-Bird (EB) tickets 的存在。
- 证明 EB tickets 可以通过低成本训练方案和一个实用的掩码距离度量被识别。
- 开发 EB Train,一种利用 EB tickets 在保持精度的同时减少训练能量和 FLOPs 的训练框架。
- 在 CIFAR 和 ImageNet 上将 EB Train 与最先进的基于剪枝的训练方法进行对比评估。
- 提供见解,说明高学习率和低精度训练如何影响 EB ticket 的出现。
- 方法论要点:对所提出的方法的关键技术、核心方程的 3–6 点要点
提出的方法
- 将 EB tickets 定义为在被剪枝掩码下从 t << i 时刻开始训练时,其子网络的准确度达到或超过稠密模型的子网。
- 早期对稠密网络进行剪枝,基于 BN 缩放因子和一个二进制掩码 m 的通道级剪枝。
- 引入掩码距离度量,具体是票据掩码之间的汉明距离,用以检测 EB ticket 的出现。
- 使用掩码距离的先进先出队列,在最近距离低于阈值 ε(如 0.1)时触发 EB ticket 识别。
- 实现 EB Train:(a)通过低成本训练(包括在搜索阶段的高学习率和 8 位精度)寻找 EB tickets;(b)仅对 EB tickets 进行再训练以达到目标精度。
- 比较 EB Train 的变体:FF(全精度搜索/再训练)、re-init、LF(低精度搜索、全精度再训练)、LL(低精度搜索/再训练)。
- 在 CIFAR-10/100、ImageNet 上使用 PreResNet101、VGG16、ResNet18/50 演示能量和 FLOPs 的节省。
- 继承 EB tickets 的权重而非回滚到初始化,遵循观察到的好处。
实验结果
研究问题
- RQ1EB tickets 是否在主流模型和数据集上普遍存在?
- RQ2是否可以在不进行完整训练的情况下,通过低成本训练和掩码距离度量可靠地识别 EB tickets?
- RQ3仅对 EB tickets 进行训练(EB Train)是否在显著的能量/FLOPs 节省下实现与传统剪枝/再训练相当或更高的精度?
- RQ4高学习率和低精度训练如何影响 EB tickets 的出现及其效用?
- RQ5在更大数据集(ImageNet)和架构(ResNet 变体)上,EB Train 相对于最先进的基线表现如何?
主要发现
- EB tickets 经常在训练极早期显现(在 160 轮中最早到第 20 轮),并且可优于在完全训练时得到的 tickets。
- 相邻轮次票据之间的掩码距离早期稳定;当一个窗口中的最大距离低于阈值 ε(0.1)时,识别出 EB tickets。
- EB Train 在 CIFAR 上对比基线实现 2.2–2.4x 的 FLOPs 下降,且精度可比甚至更高,针对 PreResNet101/VGG16。
- EB Train 在低精度搜索/再训练(FP8/8 位)的情况下实现 5.8–24.6x 的能量节省和 1.1–5.0x 的 FLOPs 节省,同时在许多设置中保持或提高精度。
- 在 ImageNet(ResNet18/50)上,EB Train 将训练 FLOPs 减少 51.5–74.0%,训练能量减少 46.5–70.9%,某些配置的准确度提高高达 +2.34%。
- 初始化的继承(重新初始化 vs. 继承 EB ticket 权重)有利于再训练性能,偏向 EB ticket 方法而非随机重新初始化。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。