QUICK REVIEW

[論文レビュー] Provable limitations of deep learning

Emmanuel Abbé, Colin Sandon|arXiv (Cornell University)|Dec 16, 2018

Machine Learning and Algorithms参考文献 36被引用数 40

ひとこと要約

The paper defines cross-predictability and proves that certain deep learning algorithms fail to learn efficiently learnable functions under low cross-predictability, with parity functions as a key example, and discusses implications for learning in various settings.

ABSTRACT

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide an extreme example with a cross-predictability that decays exponentially, while a mere super-polynomial decay of the cross-predictability is shown to be sufficient to obtain failures. Examples in community detection and arithmetic learning are also discussed. Recall that it is known that the class of neural networks (NNs) with polynomial network size can express any function that can be implemented in polynomial time, and that their sample complexity scales polynomially with the network size. The challenge is with the optimization error (the ERM is NP-hard), and the success behind deep learning is to train deep NNs with descent algorithms. The failures shown in this paper apply to training poly-size NNs on function distributions of low cross-predictability with a descent algorithm that is either run with limited memory per sample or that is initialized and run with enough randomness. We further claim that such types of constraints are necessary to obtain failures, in that exact SGD with careful non-random initialization can be shown to learn parities. The cross-predictability in our results plays a similar role the statistical dimension in statistical query (SQ) algorithms, with distinctions explained in the paper. The proof techniques are based on exhibiting algorithmic constraints that imply a statistical indistinguishability between the algorithm's output on the test model v.s.\ a null model, using information measures to bound the total variation distance.

研究の動機と目的

Motivate and formalize potential limits of deep learning in learning efficiently learnable function distributions.
Introduce cross-predictability as a measure to characterize when deep learning fails.
Show negative results for gradient-based and memory-constrained training under low cross-predictability.
Discuss implications for learning problems in parity-like functions and other domains.

提案手法

Define neural nets in a formal graph-based framework with specified input, internal, and output nodes.
Introduce cross-predictability as an expectation over pairs of functions and inputs to quantify predictability.
Analyze training dynamics of descent-based algorithms (GD/SGD/CD) under memory constraints or perturbations.
Relate the cross-predictability measure to Fourier-Walsh expansions and to indistinguishability arguments.
Provide negative results showing failure under low cross-predictability and limited memory/noise, and contrast with regimes where learning succeeds.

実験結果

リサーチクエスチョン

RQ1When do descent-based training algorithms fail to learn functions drawn from low cross-predictability distributions?
RQ2How does cross-predictability determine the learnability of parity and other function classes under neural network training constraints?
RQ3Can polynomial-size networks learn certain efficiently learnable function distributions under memory or noise constraints?
RQ4What conditions enable learning beyond cross-predictability limitations?

主な発見

Cross-predictability decays can lead to algorithmic failure of SGD/ GD with bounded memory or noise.
Parity functions with growing subset size exhibit exponentially decaying cross-predictability and thus learning failure under described constraints.
Constant-size cross-predictability allows some learning with neural nets; otherwise, failure is observed for poly-size nets under standard training assumptions.
Random initialization and limited memory/noise can impede learning even when the target function is efficiently learnable in other settings.
The framework connects cross-predictability to information-theoretic indistinguishability arguments, yielding rigorous negative results.
There are regimes where learning can succeed beyond cross-predictability under certain constructive conditions and leveraging projections onto a subspace of simple functions

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。