QUICK REVIEW

[论文解读] TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

Han Cai, Chuang Gan|arXiv (Cornell University)|Jul 22, 2020

Advanced Neural Network Applications参考文献 61被引用 33

一句话总结

TinyTL 冻结特征提取器权重，只训练偏置，通过 lite 残差模块来提升特征，实现在大幅内存节省（最高达 12.9×）的同时，准确度与全量微调相当或更好。

ABSTRACT

On-device learning enables edge devices to continually adapt the AI models to new data, which requires a small memory footprint to fit the tight memory constraint of edge devices. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters. In this work, we present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning. TinyTL freezes the weights while only learns the bias modules, thus no need to store the intermediate activations. To maintain the adaptation capacity, we introduce a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead. Extensive experiments show that TinyTL significantly saves the memory (up to 6.5x) with little accuracy loss compared to fine-tuning the full network. Compared to fine-tuning the last layer, TinyTL provides significant accuracy improvements (up to 34.1%) with little memory overhead. Furthermore, combined with feature extractor adaptation, TinyTL provides 7.3-12.9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

研究动机与目标

鉴于边缘设备内存和能耗预算有限，推动内存高效的就地学习。
将训练内存的瓶颈识别为激活值而非可训练参数。
提出 TinyTL：冻结权重并训练偏置，辅以 lite 残差模块以维持适应能力。
在多个数据集和骨干网络上评估内存-准确性权衡，包括特征提取器适应在内。

提出的方法

分析反向传播内存，显示激活值比权重更主导训练内存。
冻结特征提取器权重，只训练偏置以降低激活存储。
引入 lite 残差模块，用于以较小的内存开销 (~3.8%) 精炼中间特征。
使用分组卷积和受控分辨率/宽度，在 lite 残差中最小化激活大小。
用 GN 代替 BN，以适应小批量设备端训练。
通过 Once-For-All 网络引入特征提取器自适应，选择任务特定的骨干网络。

实验结果

研究问题

RQ1在不显著降低准确性的前提下，冻结权重并仅训练偏置是否能显著降低设备端训练内存？
RQ2lite 残余模块是否提供足够的容量来弥补冻结权重在不同数据集上的不足？
RQ3在多种骨干网络上，TinyTL 在有无特征提取器自适应情况下的表现如何？
RQ4在应用 TinyTL 到不同输入分辨率和批量大小时，内存-准确性权衡是什么？

主要发现

与对全网络进行微调相比，内存占用下降最高可达 12.9×。
带 lite 残差的 TinyTL 变体（L+B）在准确性方面超过仅偏置微调或仅归一化微调的基线。
在更高输入分辨率（320）时，TinyTL-L+B 在达到全量微调的准确性的同时节省约 6× 内存。
结合特征提取器自适应（Once-For-All），TinyTL 实现 7.5–12.9× 的内存节省，且与微调 Inception-V3 相当的准确性。
以批量大小 1 进行训练时，内存进一步降至约 16MB，支持基于 SRAM 的训练。
Lite 残差模块对在偏置仅方法上保持自适应能力至关重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。