QUICK REVIEW

[论文解读] TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning

Han Cai, Chuang Gan|arXiv (Cornell University)|Jan 1, 2020

Machine Learning and ELM被引用 43

一句话总结

TinyTL 提出了一种内存高效的设备端学习方法，通过冻结网络权重并仅训练偏置模块，消除了存储中间激活值的需求。通过引入仅带来 3.8% 内存开销的轻量化残差模块，TinyTL 实现了高达 6.5 倍的内存减少，相较于最后层微调方法，准确率最高提升 33.8%，同时在 Inception-V3 上保持与全量微调相当的性能。

ABSTRACT

On-device learning enables edge devices to continually adapt the AI models to new data, which requires a small memory footprint to fit the tight memory constraint of edge devices. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters. In this work, we present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning. TinyTL freezes the weights while only learns the bias modules, thus no need to store the intermediate activations. To maintain the adaptation capacity, we introduce a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead. Extensive experiments show that TinyTL significantly saves the memory (up to 6.5x) with little accuracy loss compared to fine-tuning the full network. Compared to fine-tuning the last layer, TinyTL provides significant accuracy improvements (up to 33.8%) with little memory overhead. Furthermore, combined with feature extractor adaptation, TinyTL provides 7.5-12.9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

研究动机与目标

为解决设备端持续学习中的内存瓶颈问题，即激活值而非参数主导内存使用。
在资源受限的边缘设备上，减少内存占用而不牺牲模型适应能力。
开发一种方法，在最小化可训练参数和激活值存储的同时保持高准确率。
通过聚焦于偏置学习和轻量化残差适应，实现深度网络在边缘设备上的高效微调。

提出的方法

冻结所有网络权重，仅训练偏置模块，以消除对中间激活值存储的需求。
引入一种轻量化残差模块，仅以 3.8% 的内存开销学习微小的残差特征图。
以内存高效的方式将轻量化残差模块应用于特征提取器的优化。
结合仅训练偏置和特征提取器适应，以维持模型性能。
采用参数高效的微调策略，避免存储反向传播所需的梯度和激活值。
设计该方法以兼容现有预训练模型（如 Inception-V3），无需架构修改。

实验结果

研究问题

RQ1通过避免存储激活值，能否显著提升设备端学习的内存效率？
RQ2仅训练偏置模块并结合轻量化残差模块，是否能保持或提升模型准确率，相比全量微调？
RQ3在内存效率和准确率方面，仅训练偏置与最后层微调相比如何？
RQ4结合偏置学习与特征提取器适应，能否同时实现高内存节省和高准确率？
RQ5在边缘设备上应用参数高效适应时，内存减少与准确率损失之间的权衡如何？

主要发现

与全网络微调相比，TinyTL 实现了高达 6.5 倍的内存减少，且准确率损失极小。
与最后层微调相比，TinyTL 在仅增加 3.8% 内存开销（来自轻量化残差模块）的前提下，准确率最高提升 33.8%。
当与特征提取器适应结合时，TinyTL 相较于全量 Inception-V3 微调，实现了 7.5 至 12.9 倍的内存节省，且不损失准确率。
轻量化残差模块仅增加 3.8% 的内存开销，但能有效实现特征优化。
通过冻结权重和仅训练偏置，TinyTL 通过消除激活值存储，在边缘设备上保持了高性能。
大量实验验证了 TinyTL 在多个基准测试中均兼具内存效率和高准确率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。