QUICK REVIEW

[论文解读] Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Xuhong Li, Yves Grandvalet|arXiv (Cornell University)|Feb 5, 2018

Domain Adaptation and Few-Shot Learning被引用 149

一句话总结

本文提出了显式正则化项，使微调偏向预训练的起点，结果表明 L2-SP（以及 L2-SP-Fisher）在多个迁移学习任务中始终优于标准的 L2 微调。

ABSTRACT

In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.

研究动机与目标

在微调过程中激发使用显式的归纳偏置，以保留来自预训练的 CNN 知识。
提出并比较在迁移学习中引用预训练参数的几种正则化项。
在若干源-目标任务对上评估正则化项，以评估相较于标准微调的好处。
为基线正则化在带有 CNN 的归纳迁移学习中提供实用建议。

提出的方法

通过在标准损失函数中添加惩罚项来定义带正则化的目标，其中预训练权重 w0 作为参照。
引入 L2-SP: Omega(w) = (alpha/2) * ||w - w0||^2 作为基线惩罚。
当目标体系结构与源不同时时，扩展为部分共享，并为共享参数与新参数设置单独的惩罚。
探索 L2-SP-Fisher：基于 Fisher 信息权重的 L2-SP 变体，保留源任务的敏感性。
研究 L1-SP 与 Group-Lasso-SP 变体，以鼓励在单元或组上冻结参数，包括 GL-SP-Fisher。
在 ResNet 上对多个源/目标对进行实验（ImageNet/Places 365 到 Caltech 256 MIT Indoors 67 Stanford Dogs 120）。

实验结果

研究问题

RQ1将显式的归纳偏置向预训练起点是否能在迁移学习中提升相较于标准微调的性能？
RQ2L2-SP、L2-SP-Fisher 及其他基于 SP 的惩罚在不同目标任务上的准确性和稳定性有何比较？
RQ3是否存在一种实用的基线惩罚，能够在迁移学习中持续优于标准权重衰减？
RQ4冻结层与应用 SP 惩罚对迁移性能有何影响？
RQ5Fisher 加权惩罚在归纳迁移场景中是否具有实质性的优势？

主要发现

L2-SP 和 L2-SP-Fisher 在所有目标任务上都显著优于标准的 L2 微调。
当目标数据稀缺时，L2-SP 的改进更加明显。
L1-SP 和 Group-Lasso-SP 变体在这些迁移设置中表现不佳或不如基于 L2 的 SP 方法。
基于 Fisher 信息的 SP 在目标任务性能上并未显著超越简单的 L2-SP，尽管在终身学习样式的情景中降低了遗忘。
冻结层通常不如 L2-SP 微调在保持预训练知识的同时适应目标任务来得有效。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。