QUICK REVIEW

[논문 리뷰] The Expressive Power of Neural Networks: A View from the Width

Lu Zhou, Hongming Pu|arXiv (Cornell University)|2017. 09. 08.

Advanced Memory and Neural Computing참고 문헌 7인용 수 125

한 줄 요약

너비-제한(ReLU) 네트워크는 너비 n+4에서 Lebesgue 적분가능 함수의 보편 근사(universal-approximation)를 수행할 수 있지만, 너비 n에서는 불가능하며, 너비 기반의 상전이를 드러낸다; 또한 네트워크 너비 효율성에 대한 다항식 하한을 제시하고 보조 실험을 제공한다.

ABSTRACT

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-$2$) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-$(n+4)$ ReLU networks, where $n$ is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-$n$ ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.

연구 동기 및 목표

잘 연구된 깊이 관점 외에 ReLU 네트워크의 너비가 표현력에 미치는 영향을 조사한다.
너비-제한 네트워크에 대한 보편 근사 정리를 증명하고, R^n에서 L1 근사를 위한 너비 임계값(n+4)을 식별한다.
너비 효율성을 검토하기 위해 좁은 네트워크로 넓은 네트워크를 근사하는데 필요한 다항식 하한을 제시한다.
실용적 너비-깊이 트레이드오프와 네트워크 설계에 대한 시사점에 관한 실험적 증거를 제공한다.

제안 방법

임의의 L1 오차 ε로 Lebesgue-적분 가능 함수를 근사하는 너비-(n+4) 완전 연결(ReLU) 네트워크를 구성한다.
목표 함수를 축 정렬 큐브(axis-aligned cubes) 위의 지시자 함수(indicator 함수)의 유한 합으로 분해하고, 이 지시자들을 ReLU 기반 블록으로 근사한다.
큐브들에 걸친 근사치를 저장하고 합산하는 블록 단위 네트워크 아키텍처를 도입하여 전역 근사를 구성한다.
구성적 네트워크 설계를 통해 너비-제한 보편 근사 정리(Theorem 1)를 증명하고, 고전적 깊이-제한 보편 근사와 비교한다.
넓은 네트워크를 더 좁은 네트워크로 근사하는 데 필요한 다항식 하한(Theorem 4)을 도출하여 너비 효율성을 분석하고, 실험적 검증을 논의한다.

실험 결과

연구 질문

RQ1너비-n+4의 너비-제한 ReLU 네트워크가 R^n에서 L1 거리로 Lebesgue-적분 가능한 함수들을 보편적으로 근사하는가?
RQ2너비가 임계값 n에서 n+1로 넘을 때 표현력에 상전이가 존재하는가?
RQ3다항식적으로 더 큰 크기를 가지는 경우를 제외하고, 넓은 네트워크를 좁은 네트워크로 근사할 수 없는 네트워크가 존재하는가?
RQ4실험 결과가 근사에 필요한 네트워크 크기와 너비 사이의 다항식적(지수적이 아닌) 트레이드오프를 지지하는가?

주요 결과

너비-(n+4) ReLU 네트워크는 R^n에서 임의의 L1 정확도로 Lebesgue-적분 가능 함수를 근사할 수 있다.
측도 0의 집합을 제외하고, R^n에서 L1로 너비-n ReLU 네트워크로 함수들을 근사할 수 없으므로 상전이가 나타난다.
너비-O(k^2) 깊이-3 네트워크가 너비-O(k^1.5) 깊이-k 네트워크로 근사될 수 없다는 것이 존재하여 다항식 너비 효율성 하한(Theorem 4)을 보여준다.
실험은 다항식 하한보다 다소 큰 크기의 좁은 네트워크가 넓고 얕은 네트워크를 높은 정확도로 근사할 수 있음을 보여준다.
전반적으로, 깊이가 ReLU 네트워크의 표현력에 있어 너비보다 더 효과적일 수 있음을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.