QUICK REVIEW

[论文解读] FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Yonggan Fu, Haoran You|arXiv (Cornell University)|Dec 24, 2020

Advanced Neural Network Applications被引用 30

一句话总结

FracTrain 结合渐进分数量化与输入自适应动态分数量化，在降低 DNN 训练成本的同时保持准确性，并在多种模型和数据集上得到验证。

ABSTRACT

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.

研究动机与目标

在资源有限的情况下，激励在设备端或边缘进行高效的 DNN 训练。
开发在训练过程不静态、且能随训练轨迹和输入而自适应的量化策略。
提出 Progressive Fractional Quantization (PFQ)，在训练中逐步提高精度。
提出 Dynamic Fractional Quantization (DFQ)，利用轻量级门控按输入自适应调整层的精度。
将 PFQ 和 DFQ 集成到统一的 FracTrain 框架中，并评估其训练成本节约和准确性。

提出的方法

引入 PFQ，其使用四阶段的精度计划和基于 epoch 间差异的损失变动指示器，逐步提高精度。
引入 DFQ，设定每层的门控网络，通过软中间变体在位比特精度之间选择，并结合一个考虑成本的训练目标函数。
通过将 PFQ 与 DFQ 结合来定义 FracTrain 目标，PFQ 控制时间维度的精度进展，而 DFQ 处理空间维度的、输入自适应的精度。
将层计算建模为门控的低位卷积之和加上跳跃连接，以实现分数更新。
使用一个考虑成本的损失项 cp(W_base, W_G)，并调整权重参数的符号以使目标接近目标训练成本 cp。
在六个模型（ResNet-18/34/38/74、MobileNetV2、Transformer-base）上，在 CIFAR-10/100、ImageNet 和 WikiText-103 上进行评估。

实验结果

研究问题

RQ1在训练期间逐步提高精度（PFQ）是否能够在不牺牲准确性的情况下实现更低的训练成本？
RQ2输入自适应、按层选择精度（DFQ）是否能在 PFQ 的基础上进一步降低训练成本？
RQ3在不同模型、数据集和任务中，时空分数量化（FracTrain）的综合收益有哪些？
RQ4与最先进的静态低精度训练基线相比，FracTrain 在准确性和训练成本方面的表现如何？

主要发现

FracTrain 实现了显著的训练成本节约，并在多种模型和数据集上通常具有可比或更好的准确性。
PFQ 一致地降低训练成本，同时在 ResNet-38/74 和 CIFAR-10/100 上相对于 SBM 维持或略有提升的准确性。
DFQ 相对于 SBM 在保持或提升准确性的同时降低计算成本，且优于选择性层更新方法。
FracTrain（PFQ+DFQ）在 MACs 上实现大幅度下降（可达数十个百分点），在能耗和延迟等硬件指标上也有显著改善，同时保持可比的准确性。
在 ImageNet 和 WikiText-103 上，PFQ 分别将成本降低约 21% 和 44%，同时保持或提升准确性/困惑度。
在 CIFAR-100 的 adapt 与 fine-tune 场景中，FracTrain 保持或略微提升准确性，同时显著降低 MACs。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。