QUICK REVIEW

[논문 리뷰] Provable limitations of deep learning

Emmanuel Abbé, Colin Sandon|arXiv (Cornell University)|2018. 12. 16.

Machine Learning and Algorithms참고 문헌 36인용 수 40

한 줄 요약

이 논문은 cross-predictability를 정의하고 특정 딥 러닝 알고리즘이 낮은 cross-predictability 하에서 학습 가능한 함수들을 효율적으로 학습하는 데 실패함을 보이며, parity 함수가 핵심 예로 제시되고, 다양한 설정에서 학습에 대한 시사점을 논의한다.

ABSTRACT

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide an extreme example with a cross-predictability that decays exponentially, while a mere super-polynomial decay of the cross-predictability is shown to be sufficient to obtain failures. Examples in community detection and arithmetic learning are also discussed. Recall that it is known that the class of neural networks (NNs) with polynomial network size can express any function that can be implemented in polynomial time, and that their sample complexity scales polynomially with the network size. The challenge is with the optimization error (the ERM is NP-hard), and the success behind deep learning is to train deep NNs with descent algorithms. The failures shown in this paper apply to training poly-size NNs on function distributions of low cross-predictability with a descent algorithm that is either run with limited memory per sample or that is initialized and run with enough randomness. We further claim that such types of constraints are necessary to obtain failures, in that exact SGD with careful non-random initialization can be shown to learn parities. The cross-predictability in our results plays a similar role the statistical dimension in statistical query (SQ) algorithms, with distinctions explained in the paper. The proof techniques are based on exhibiting algorithmic constraints that imply a statistical indistinguishability between the algorithm's output on the test model v.s.\ a null model, using information measures to bound the total variation distance.

연구 동기 및 목표

학습적으로 효율적으로 학습 가능한 함수 분포의 잠재적 한계를 동기 부여하고 형식화한다.
딥 러닝이 실패하는 시점을 특징화하기 위한 척도로 cross-predictability를 도입한다.
낮은 cross-predictability 하에서 그래디언트 기반 학습과 메모리 제약 학습의 부정적 결과를 보인다.
parity와 같은 함수 및 기타 영역에서의 학습 문제에 대한 시사점을 논의한다.

제안 방법

지정된 입력, 내부, 출력 노드를 갖는 형식적인 그래프 기반 프레임워크에서 신경망을 정의한다.
predictability를 정량화하기 위한 함수 쌍과 입력 쌍에 대한 기댓값으로 cross-predictability를 도입한다.
메모리 제약이나 교란하에서의 descent 기반 알고리즘(GD/SGD/CD)의 학습 동역학을 분석한다.
cross-predictability 척도를 Fourier-Walsh 전개 및 indistinguishability 주장과 연관시킨다.
낮은 cross-predictability 및 제한된 메모리/잡음 하에서의 실패를 보여주는 부정적 결과를 제시하고, 학습에 성공하는 구간과 대조한다.

실험 결과

연구 질문

RQ1descent 기반 학습 알고리즘이 낮은 cross-predictability 분포에서 뽑은 함수들을 학습하는 데 실패하는 시점은 언제인가?
RQ2cross-predictability가 신경망 학습 제약 하에서 parity 및 다른 함수 클래스의 학습 가능성을 어떻게 결정하는가?
RQ3메모리 또는 잡음 제약 하에서 다항 크기의 네트워크가 특정하게 효율적으로 학습 가능한 함수 분포를 학습할 수 있는가?
RQ4cross-predictability 한계를 넘어 학습을 가능하게 하는 조건은 무엇인가?

주요 결과

Cross-predictability 감소는 bounded memory 또는 잡음을 가진 SGD/GD의 알고리즘적 실패로 이어질 수 있다.
성장하는 부분집합 크기를 갖는 parity 함수는 지수적으로 감소하는 cross-predictability를 보이며 따라서 기술된 제약 하에서 학습 실패를 보인다.
상수 크기의 cross-predictability는 신경망으로 일부 학습을 가능하게 하지만, 그렇지 않은 경우 표준 학습 가정 하의 poly-size 네트에서 실패가 관찰된다.
무작위 초기화와 제한된 메모리/잡음은 다른 설정에서 효율적으로 학습 가능한 타깃 함수라도 학습을 방해할 수 있다.
이 프레임워크는 cross-predictability를 정보 이론적 불구별성(indistinguishability) 주장과 연결하여 엄밀한 부정적 결과를 도출한다.
일부 구성적 조건과 단순 함수 부분 공간으로의 프로젝션을 활용하는 경우, cross-predictability를 넘어 학습이 성공하는 구간이 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.