QUICK REVIEW

[论文解读] Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning

Han Cai, Chuang Gan|arXiv (Cornell University)|Jul 22, 2020

Machine Learning and ELM被引用 14

一句话总结

TinyTL 是一种内存高效的设备端迁移学习方法，通过冻结特征提取器权重并仅训练偏置，避免存储中间激活。它采用轻量级残差模块和参数共享的超网络结构，结合离散子网络选择机制，实现高达13.3倍的内存成本降低，且无精度损失。

ABSTRACT

We present Tiny-Transfer-Learning (TinyTL), an efficient on-device learning method to adapt pre-trained models to newly collected data on edge devices. Different from conventional transfer learning methods that fine-tune the full network or the last layer, TinyTL freezes the weights of the feature extractor while only learning the biases, thus doesn't require storing the intermediate activations, which is the major memory bottleneck for on-device learning. To maintain the adaptation capacity without updating the weights, TinyTL introduces memory-efficient lite residual modules to refine the feature extractor by learning small residual feature maps in the middle. Besides, instead of using the same feature extractor, TinyTL adapts the architecture of the feature extractor to fit different target datasets while fixing the weights: TinyTL pre-trains a large super-net that contains many weight-shared sub-nets that can individually operate; different target dataset selects the sub-net that best match the dataset. This backpropagation-free discrete sub-net selection incurs no memory overhead. Extensive experiments show that TinyTL can reduce the training memory cost by order of magnitude (up to 13.3x) without sacrificing accuracy compared to fine-tuning the full network.

研究动机与目标

解决设备端微调带来的高内存开销，尤其是中间激活存储问题。
实现在资源受限边缘设备上对预训练模型进行高效适应。
在设备端学习过程中显著降低内存占用，同时保持模型精度。
开发一种避免通过特征提取器反向传播的方法，以消除内存开销。

提出的方法

冻结特征提取器中所有卷积层和全连接层的权重，仅训练偏置，从而消除激活存储需求。
引入轻量级残差模块，用于在网络中间学习小规模残差特征图，以保持适应能力。
构建一个包含多个参数共享子网络的超网络，每个子网络均可独立推理。
通过无需反向传播的离散搜索机制，为每个目标数据集选择最优子网络。
固定所选子网络的权重，仅对最终分类器头进行微调（仅更新偏置）。
在所有数据集上使用相同的预训练超网络，动态选择与目标数据最匹配的子网络。

实验结果

研究问题

RQ1通过避免完整网络微调，能否显著提升设备端迁移学习的内存效率？
RQ2在不更新特征提取器权重的情况下，如何保持模型的适应能力？
RQ3一个预训练的超网络能否以极低的内存开销支持多个目标数据集？
RQ4冻结特征提取器权重时，内存减少与精度之间的权衡如何？
RQ5能否通过离散子网络选择替代基于梯度的适应，且不增加内存开销？

主要发现

与完整网络微调相比，TinyTL 将训练内存成本降低了高达13.3倍，且无精度下降。
冻结特征提取器权重并仅训练偏置，可消除对中间激活存储的需求，从而解决主要内存瓶颈。
轻量级残差模块的使用可在不更新主网络权重的情况下实现有效适应。
通过离散且无需反向传播的子网络选择机制，实现零额外内存开销。
尽管冻结了特征提取器，该方法在多种数据集上仍保持了具有竞争力的精度。
超网络架构通过选择最合适的子网络，实现了对不同目标数据集的动态适应。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。