Skip to main content
QUICK REVIEW

[论文解读] Provable limitations of deep learning

Emmanuel Abbé, Colin Sandon|arXiv (Cornell University)|Dec 16, 2018
Machine Learning and Algorithms参考文献 36被引用 40
一句话总结

该论文定义 cross-predictability 并证明在低 cross-predictability 下,某些深度学习算法无法有效学习可学习函数,以 parity 函数作为关键示例,并讨论在各种情境下的学习含义。

ABSTRACT

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide an extreme example with a cross-predictability that decays exponentially, while a mere super-polynomial decay of the cross-predictability is shown to be sufficient to obtain failures. Examples in community detection and arithmetic learning are also discussed. Recall that it is known that the class of neural networks (NNs) with polynomial network size can express any function that can be implemented in polynomial time, and that their sample complexity scales polynomially with the network size. The challenge is with the optimization error (the ERM is NP-hard), and the success behind deep learning is to train deep NNs with descent algorithms. The failures shown in this paper apply to training poly-size NNs on function distributions of low cross-predictability with a descent algorithm that is either run with limited memory per sample or that is initialized and run with enough randomness. We further claim that such types of constraints are necessary to obtain failures, in that exact SGD with careful non-random initialization can be shown to learn parities. The cross-predictability in our results plays a similar role the statistical dimension in statistical query (SQ) algorithms, with distinctions explained in the paper. The proof techniques are based on exhibiting algorithmic constraints that imply a statistical indistinguishability between the algorithm's output on the test model v.s.\ a null model, using information measures to bound the total variation distance.

研究动机与目标

  • Motivate and formalize potential limits of deep learning in learning efficiently learnable function distributions.
  • Introduce cross-predictability as a measure to characterize when deep learning fails.
  • Show negative results for gradient-based and memory-constrained training under low cross-predictability.
  • Discuss implications for learning problems in parity-like functions and other domains.

提出的方法

  • Define neural nets in a formal graph-based framework with specified input, internal, and output nodes.
  • Introduce cross-predictability as an expectation over pairs of functions and inputs to quantify predictability.
  • Analyze training dynamics of descent-based algorithms (GD/SGD/CD) under memory constraints or perturbations.
  • Relate the cross-predictability measure to Fourier-Walsh expansions and to indistinguishability arguments.
  • Provide negative results showing failure under low cross-predictability and limited memory/noise, and contrast with regimes where learning succeeds.

实验结果

研究问题

  • RQ1When do descent-based training algorithms fail to learn functions drawn from low cross-predictability distributions?
  • RQ2How does cross-predictability determine the learnability of parity and other function classes under neural network training constraints?
  • RQ3Can polynomial-size networks learn certain efficiently learnable function distributions under memory or noise constraints?
  • RQ4What conditions enable learning beyond cross-predictability limitations?

主要发现

  • Cross-predictability decays can lead to algorithmic failure of SGD/ GD with bounded memory or noise.
  • Parity functions with growing subset size exhibit exponentially decaying cross-predictability and thus learning failure under described constraints.
  • Constant-size cross-predictability allows some learning with neural nets; otherwise, failure is observed for poly-size nets under standard training assumptions.
  • Random initialization and limited memory/noise can impede learning even when the target function is efficiently learnable in other settings.
  • The framework connects cross-predictability to information-theoretic indistinguishability arguments, yielding rigorous negative results.
  • There are regimes where learning can succeed beyond cross-predictability under certain constructive conditions and leveraging projections onto a subspace of simple functions.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。