QUICK REVIEW

[论文解读] SpotTune: Transfer Learning through Adaptive Fine-tuning

Yunhui Guo, Humphrey Shi|arXiv (Cornell University)|Nov 21, 2018

Domain Adaptation and Few-Shot Learning参考文献 43被引用 42

一句话总结

SpotTune 为预训练网络学习逐实例的微调策略，通过将每个输入路由到冻结层或已微调的层，以提高跨多种数据集的迁移学习性能。

ABSTRACT

Transfer learning, which allows a source task to affect the inductive bias of the target task, is widely used in computer vision. The typical way of conducting transfer learning with deep neural networks is to fine-tune a model pre-trained on the source task using data from the target task. In this paper, we propose an adaptive fine-tuning approach, called SpotTune, which finds the optimal fine-tuning strategy per instance for the target data. In SpotTune, given an image from the target task, a policy network is used to make routing decisions on whether to pass the image through the fine-tuned layers or the pre-trained layers. We conduct extensive experiments to demonstrate the effectiveness of the proposed approach. Our method outperforms the traditional fine-tuning approach on 12 out of 14 standard datasets.We also compare SpotTune with other state-of-the-art fine-tuning strategies, showing superior performance. On the Visual Decathlon datasets, our method achieves the highest score across the board without bells and whistles.

研究动机与目标

通过将逐实例的自适应决策与来自预训练网络的迁移学习结合起来，推动相对于标准微调的改进。
提出一个轻量级策略网络，为每个输入决定对哪些残差块进行微调与冻结。
引入使用 Gumbel Softmax 的训练机制，使离散路由决策能够进行可微学习。
探讨一个全局策略变体，将微调限制在固定的块集合上以减少参数。

提出的方法

用一个冻结的预训练版本和一个从预训练块初始化的可训练复制块来表示每个残差块。
通过策略网络为每个块学习逐实例的二元决策 I_l(x)（冻结或微调）。
从 Gumbel-Softmax 分布中采样 I_l(x)，以实现对离散决策的反向传播。
使用标准分类损失将策略网络与目标任务联合训练，对 Gumbel-Softmax 使用直通估计。
可选地，施加紧凑的全局策略，约束所有输入使用相同的 k 个微调块，并添加辅助损失以促使二值化。
提供一个 global-k 变体，通过舍弃未被微调的预训练块来减少参数。

实验结果

研究问题

RQ1逐实例将输入在预训练块与微调块之间路由，能否在迁移学习性能上优于统一微调策略？
RQ2固定子集块的全局策略是否能在参数更少的情况下实现具有竞争力的准确性？
RQ3SpotTune 与多样数据集及 Visual Decathlon 基准上的最新微调方法相比如何？
RQ4在跨任务可视化块使用时，关于学习到的微调策略有哪些洞见？

主要发现

SpotTune 在测试的14个数据集中的12个上优于标准微调。
在不修改架构的前提下，该方法在 Visual Decathlon 的得分中达到比较方法中的最高分。
逐实例的策略产生数据集特定和样本特定的路由决策，从而实现更好的特征复用和适应。
紧凑的 global-k 变体在保持较强性能的同时减少参数，并且优于某些 last-k 微调基线。
L2-SP 提升了微调性能，但不及 SpotTune，并且与之互补。
可视化显示跨数据集存在多样化的微调策略，表明非连续且依赖输入的微调是有益的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。