QUICK REVIEW

[論文レビュー] Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Xuhong Li, Yves Grandvalet|arXiv (Cornell University)|Feb 5, 2018

Domain Adaptation and Few-Shot Learning被引用数 149

ひとこと要約

本論文は、事前学習開始点をバイアスする明示的な正則化項を導入し、L2-SP（および L2-SP-Fisher）が標準の L2 微調整を複数の転移学習タスクで一貫して上回ることを示している。

ABSTRACT

In inductive transfer learning, fine-tuning pre-trained convolutional\nnetworks substantially outperforms training from scratch. When using\nfine-tuning, the underlying assumption is that the pre-trained model extracts\ngeneric features, which are at least partially relevant for solving the target\ntask, but would be difficult to extract from the limited amount of data\navailable on the target task. However, besides the initialization with the\npre-trained model and the early stopping, there is no mechanism in fine-tuning\nfor retaining the features learned on the source task. In this paper, we\ninvestigate several regularization schemes that explicitly promote the\nsimilarity of the final solution with the initial model. We show the benefit of\nhaving an explicit inductive bias towards the initial model, and we eventually\nrecommend a simple $L^2$ penalty with the pre-trained model being a reference\nas the baseline of penalty for transfer learning tasks.\n

研究の動機と目的

Motivate the use of explicit inductive bias to preserve knowledge from pre-trained CNNs during fine-tuning.
Propose and compare several regularizers that reference the pre-trained parameters during transfer learning.
Evaluate regularizers on several source-target task pairs to assess benefits over standard fine-tuning.
Provide practical recommendations for baseline regularization in inductive transfer learning with CNNs.

提案手法

Define regularized objective by adding a penalty term to the standard loss, with the pre-trained weights w0 as reference.
Introduce L2-SP: Omega(w) = (alpha/2) * ||w - w0||^2 as a baseline penalty.
Extend to partial sharing with separate penalties for shared vs. new parameters when target architecture differs from source.
Explore L2-SP-Fisher: a Fisher-information-weighted variant of L2-SP that preserves source-task sensitivity.
Investigate L1-SP and Group-Lasso-SP variants to encourage parameter freezing at units or groups, including GL-SP-Fisher.
Conduct experiments with ResNet across multiple source/target pairs (ImageNet/Places 365 to Caltech 256 MIT Indoors 67 Stanford Dogs 120).

実験結果

リサーチクエスチョン

RQ1Does an explicit inductive bias toward the pre-trained starting point improve transfer learning performance over standard fine-tuning?
RQ2How do L2-SP, L2-SP-Fisher, and other SP-based penalties compare in accuracy and stability across different target tasks?
RQ3Is there a practical baseline penalty that consistently outperforms standard weight decay for transfer learning?
RQ4What is the impact of freezing layers versus applying SP penalties on transfer performance?
RQ5Do Fisher-weighted penalties provide a meaningful advantage in inductive transfer scenarios?

主な発見

データベース	L2	L2-SP	L2-SP-Fisher
MIT Indoors 67	79.6 ± 0.5	84.2 ± 0.3	84.0 ± 0.4
Stanford Dogs 120	81.4 ± 0.2	85.1 ± 0.2	85.1 ± 0.2
Caltech 256 – 30	81.5 ± 0.2	83.5 ± 0.1	83.3 ± 0.1
Caltech 256 – 60	85.3 ± 0.2	86.4 ± 0.2	86.0 ± 0.1

L2-SP and L2-SP-Fisher consistently improve over standard L2 fine-tuning across all target tasks.
The improvements from L2-SP are more pronounced when target data are scarce.
L1-SP and Group-Lasso-SP variants perform poorly or are less favorable than L2-based SP methods in these transfer settings.
Fisher-information based SP does not significantly outperform simple L2-SP in target-task performance, though it reduces forgetting in lifelong-learning-like scenarios.
Freezing layers is often less effective than L2-SP fine-tuning for preserving pre-trained knowledge while adapting to the target task.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。