QUICK REVIEW

[论文解读] Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Li, Xuhong, Yves Grandvalet|arXiv (Cornell University)|Feb 5, 2018

Domain Adaptation and Few-Shot Learning被引用 100

一句话总结

该论文提出正则化方法，使微调偏向预训练的起点，L2-SP（以及 L2-SP-Fisher）在CNN的迁移学习中优于标准 L2 及其他惩罚。

ABSTRACT

In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.

研究动机与目标

动机：在迁移学习过程中利用显式的归纳偏置来保留来自预训练 CNN 的知识。
研究若干引用预训练模型而非来源的参数正则化项。
在多组源任务/目标任务对上使用 CNN（ResNet）评估所提出的正则化项。
将 L2-SP 的变体与标准 L2 以及其他稀疏/组稀疏惩罚进行性能对比。

提出的方法

将带正则化的目标函数定义为 J~(w)=J(w)+Omega(w)，其中 J 为负对数似然。
提出 L2-SP：Omega(w)= (alpha/2) * ||w - w0||^2，以预训练权重 w0 作为参照。
扩展到部分共享，针对共享部分 (S) 和新部分 (S-bar) 分别设定惩罚： (alpha/2)||wS - w0S||^2 + (beta/2)||wS-bar||^2。
加入 L1-SP： alpha||wS - w0S||_1 + (beta/2)||wS-bar||^2。
引入 Group-Lasso-SP (GL-SP)，其组对应卷积核组： alpha sum_g sg ||wGg - w0Gg||_2 + (beta/2)||wS-bar||^2。
考虑 GL-SP-Fisher 与 L2-SP-Fisher 变体，使用对角 Fisher 信息作为加权度量。
使用 ResNet 在 ImageNet/Places365 作为源任务，MIT Indoors 67、Stanford Dogs 120、Caltech 256 (30/60) 作为目标任务进行实验。
评估正则化项对准确率、分层激活相似性（R^2）以及在终生学习-like 设置中的遗忘的影响。

实验结果

研究问题

RQ1通过正则化引入对预训练初始化的显式归纳偏置，是否能在迁移学习性能上优于标准 L2 正则化？
RQ2L2-SP、L2-SP-Fisher、L1-SP、GL-SP及其 Fisher 变体在不同的源任务-目标任务对上如何比较？
RQ3在 CNN 的迁移学习中，是否仅用简单的 L2-SP 惩罚作为基线就足够？
RQ4正则化对在跨层保持预训练特征表示的影响（激活层级分析）是什么？

主要发现

数据库	L2	L2-SP	L2-SP-Fisher
MIT Indoors 67	79.6	84.2	84.0
Stanford Dogs 120	81.4	85.1	85.1
Caltech 256 – 30	81.5	83.5	83.3
Caltech 256 – 60	85.3	86.4	86.0

L2-SP 和 L2-SP-Fisher 在所有目标数据库上均显著优于标准 L2 微调。
当目标数据稀缺时，L2-SP 和 L2-SP-Fisher 的改进更大。
在这些迁移设置中，L1-SP 和 Group-Lasso 惩罚的表现不如基于 L2 的惩罚。
基于 Fisher 的变体在目标任务迁移中并未显著超越欧几里得 L2-SP，尽管 L2-SP-Fisher 可以在类似终生学习的情景中降低遗忘。
冻结层可以帮助 L2 正则化，但效果不及 L2-SP 微调；L2-SP 常常优于冻结策略。
激活层级分析表明，L2-SP 在网络更深层次保持预训练单元的角色（高 R^2），优于标准 L2。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。