Skip to main content
QUICK REVIEW

[论文解读] How transferable are features in deep neural networks?

Jason Yosinski, Jeff Clune|arXiv (Cornell University)|Nov 6, 2014
Advanced Neural Network Applications参考文献 15被引用 3,456
一句话总结

该论文量化了在ImageNet上训练的深度CNN中逐层特征的可迁移性,揭示了通用的前几层和任务特定的最后几层,以及优化和共适应效应。

ABSTRACT

Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

研究动机与目标

  • 定义一个正式的度量,用以衡量神经特征的一般性与特异性在跨任务中的迁移能力。
  • 描述在深度CNN中迁移性如何随着层次变化。
  • 确定降低迁移性能的因素,包括较高层的专业化和由共适应层引起的优化困难。
  • 评估任务相似性如何影响迁移性,并与随机特征进行比较。
  • 探讨迁移后的特征在目标任务微调后是否能提升泛化。

提出的方法

  • 在随机分割的 ImageNet 任务对 A 和 B 上使用八层 CNN 训练基础网络。
  • 将基础网络的前 n 层复制到迁移/冻结设置,并在目标任务上训练剩余层。
  • 比较冻结与微调的迁移层,以分离一般性与特异性效应。
  • 在多个 A/B 拆分上重复,且还使用一个不同的人工与自然类别分割来衡量任务距离。
  • 包含对照:冻结基础特征(selffer)和随机初始化以供比较。
  • 分析从底层、中间层和顶层的迁移,以映射逐层的普遍性。)

实验结果

研究问题

  • RQ1每层学习的特征对迁移到不同目标任务的普遍性有多大?
  • RQ2网络中的哪一处发生一般表示与特异表示之间的转变,其 across layers 的陡峭程度如何?
  • RQ3导致迁移性能下降的机制是什么:共适应 vs 特征特异性?
  • RQ4任务相似性/距离如何影响特征的迁移性,特别是对高层?
  • RQ5将特征迁移后再进行微调,是否能在超过仅在目标任务上训练的泛化?

主要发现

  • 第一层和第二层的特征在相似任务之间几乎可以完美迁移,表明早期层具有通用性。
  • 中到上层由于共适应和日益增大的任务特异性而表现出较低的迁移性。
  • 基础任务和目标任务距离增加,迁移性能下降,尤其是对高层。
  • 将特征迁移后再微调(AnB+)相较直接在目标任务上训练,提供了泛化提升,且在经过大量微调后仍然存在。
  • 即使来自距离较远的任务的迁移也优于使用随机特征,并且在保持的层数范围(1–7)内优势仍然存在。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。