Skip to main content
QUICK REVIEW

[论文解读] Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

Stéphane d’Ascoli, Levent Sagun|arXiv (Cornell University)|Jun 16, 2019
Generative Adversarial Networks and Image Synthesis被引用 24
一句话总结

本文提出一种将卷积神经网络(CNNs)嵌入其等效全连接网络(eFCNs)的方法,实现对训练动态的直接比较。通过先正常训练CNN,在选定的“松弛时间”将其权重投影到eFCN空间并继续训练,作者表明,所得的eFCN能够超越原始CNN和标准FCN,通过结合架构偏置与增强的表达能力,在FCN损失景观中揭示稀有的高泛化性区域。

ABSTRACT

Despite the phenomenal success of deep neural networks in a broad range of learning tasks, there is a lack of theory to understand the way they work. In particular, Convolutional Neural Networks (CNNs) are known to perform much better than Fully-Connected Networks (FCNs) on spatially structured data: the architectural structure of CNNs benefits from prior knowledge on the features of the data, for instance their translation invariance. The aim of this work is to understand this fact through the lens of dynamics in the loss landscape. We introduce a method that maps a CNN to its equivalent FCN (denoted as eFCN). Such an embedding enables the comparison of CNN and FCN training dynamics directly in the FCN space. We use this method to test a new training protocol, which consists in training a CNN, embedding it to FCN space at a certain ``relax time'', then resuming the training in FCN space. We observe that for all relax times, the deviation from the CNN subspace is small, and the final performance reached by the eFCN is higher than that reachable by a standard FCN of same architecture. More surprisingly, for some intermediate relax times, the eFCN outperforms the CNN it stemmed, by combining the prior information of the CNN and the expressivity of the FCN in a complementary way. The practical interest of our protocol is limited by the very large size of the highly sparse eFCN. However, it offers interesting insights into the persistence of architectural bias under stochastic gradient dynamics. It shows the existence of some rare basins in the FCN loss landscape associated with very good generalization. These can only be accessed thanks to the CNN prior, which helps navigate the landscape during the early stages of optimization.

研究动机与目标

  • 解耦深度学习中的架构偏置与优化偏置。
  • 研究尽管容量相似,为何CNN在空间结构化数据上泛化性能优于FCN。
  • 检验CNN的归纳偏置是否可被利用以在FCN损失景观中访问更优的泛化区域。
  • 探索在训练过程中放松CNN约束是否可获得优于全程保持约束的性能。
  • 理解早期优化动态在导航复杂损失景观中的作用。

提出的方法

  • 作者定义了CNN向其等效全连接网络(eFCN)的线性嵌入,保留网络结构的同时移除权重重用和局部性约束。
  • 他们先正常训练CNN,然后在选定的“松弛时间”将权重投影到eFCN空间,并在无约束条件下继续训练。
  • eFCN以CNN在松弛时间的权重初始化,并在完整的FCN参数空间中训练。
  • 该方法使CNN与eFCN在相同参数空间中的训练动态得以直接比较。
  • 作者分析了eFCN中的权重模式,特别是非局部块中模板匹配行为的出现。
  • 在CIFAR-10上通过该协议在多个松弛时间下进行实验,以评估泛化性能。

实验结果

研究问题

  • RQ1CNN的架构归纳偏置是否能引导优化过程在FCN损失景观中找到更优的泛化区域?
  • RQ2在中间训练阶段放松CNN约束是否能带来相比全程保持约束的性能提升?
  • RQ3早期优化动态在访问FCN参数空间中稀有的高性能区域中起什么作用?
  • RQ4在FCN空间中CNN子空间附近是否存在具有更优泛化性能的特定区域?
  • RQ5CNN先验与FCN表达能力的结合是否能带来超越单一架构的性能提升?

主要发现

  • 在所有测试的松弛时间下,eFCN始终接近CNN子空间,表明架构偏置在早期训练中依然保持。
  • eFCN的测试准确率高于同架构的标准FCN,证明了CNN先验的优势。
  • 在某些中间松弛时间下,eFCN的表现优于原始CNN,表明结合CNN归纳偏置与FCN表达能力可实现更优的泛化性能。
  • eFCN在其非局部权重块中发展出清晰的、类似图像的轮廓,表明出现了标准FCN中不存在的新兴模板匹配行为。
  • 这种模板匹配行为仅在与卷积特征学习结合时才有效,因为独立的模板匹配在CIFAR-10等复杂数据集上表现失败。
  • 在CNN子空间之外存在一个阈值距离,超过该距离性能会退化至标准FCN水平,表明在CNN空间附近存在一个狭窄但高性能的区域。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。