Skip to main content
QUICK REVIEW

[论文解读] A Selective Overview of Deep Learning

Jianqing Fan, Cong Ma|arXiv (Cornell University)|Apr 10, 2019
Neural Networks and Applications参考文献 125被引用 42
一句话总结

该论文从统计学视角回顾深度学习,详细讨论前馈网络、CNN、RNN、训练 regime,以及关于表示能力和泛化的理论,强调深度和过参数化。

ABSTRACT

Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research.

研究动机与目标

  • 阐明深度学习为何与经典方法不同,以及它带来哪些新特征(深度、过参数化、隐式先验)。
  • 从统计学角度介绍常见的神经网络模型(CNNs、RNNs)及其训练技术。
  • 通过近似理论讨论表示能力及对泛化的视角。
  • 突出训练动态、隐式正则化,以及深度学习中的算法层面观点。

提出的方法

  • 描述前馈神经网络及其在计算图中的通过反向传播的训练。
  • 解释卷积神经网络和循环神经网络及其核心构件(CONV、POOL、LSTM 变体)与权重共享。
  • 通过近似理论对深度网络的表示能力进行分析。
  • 讨论深度学习中的随机梯度下降、正则化以及泛化控制。

实验结果

研究问题

  • RQ1深度学习与经典统计方法之间到底有哪些根本区别?
  • RQ2深度和过参数化如何影响表示、训练和泛化?
  • RQ3关于深度网络的近似能力和泛化,存在哪些理论基础?
  • RQ4流行模型(CNN、RNN)及其训练技术如何对实际性能产生影响?

主要发现

  • 深度学习依赖于将大量非线性函数组合起来,以建模数据中的复杂相关性。
  • 深度使对浅层模型难以表达的交互关系得到高效表示。
  • 过参数化可以在 SGD 下实现接近零的训练误差,并且仍然保持相当的泛化性能。
  • 通过训练学习隐式先验,使得得到有用的表示,而无需显式的特征设计。
  • 本文综述深度网络的近似理论进展,并通过统一性/收敛性分析讨论泛化。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。