QUICK REVIEW

[논문 리뷰] Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor|arXiv (Cornell University)|2018. 04. 24.

Machine Learning and Data Classification참고 문헌 19인용 수 63

한 줄 요약

본 논문은 무작위 부분공간 학습을 도입하여 신경망 목표 공간의 내재 차원(intrinsic dimension)을 측정하고, 많은 문제에서 총 매개변수보다 훨씬 적은 활성 자유도(active degrees of freedom)가 필요함을 밝히며, MDL 기반의 모델링 관점을 압축적으로 가능하게 한다.

ABSTRACT

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

연구 동기 및 목표

내재 차원을 매개변수 공간에서 해 집합의 여사로서 정의한다.
무작위 부분공간 최적화를 이용하여 내재 차원을 추정하는 실용적인 방법을 개발한다.
목표 지형을 매핑하기 위해 아키텍처, 데이터셋 및 학습 패러다임 간의 내재 차원을 비교한다.
모델 압축 및 MDL 기반 모델 선택에 대한 시사점을 탐구한다.

제안 방법

전체 매개변수 공간의 d차원 부분공간을 정의하기 위해 랜덤 프로젝션 P를 도입한다.
theta^(d)라는 부분공간 좌표만 학습하고 theta^(D)_0와 P는 고정된 채로 유지한다.
해(해당 성능이 임계치를 넘는) 해가 존재하는 가장 작은 부분공간을 식별하기 위해 d를 증가시키는 방법(d_int90).
성능 임계치(예: 기준선의 90%)를 사용하여 해를 분류하고 부트스트랩으로 재현성 검증을 수행한다.
FC, LeNet, CNN 및 RL 과제 간의 내재 차원을 비교하고 프로젝션 방법(밀집, 희소, Fastfood)을 분석한다.
d_int90를 최소 설명 길이(MDL)와 연관시키고 압축에 대한 시사점을 논의한다.

실험 결과

연구 질문

RQ1무작위로 방향이 정해진 부분공간에서 최적화될 때 다양한 신경망 문제의 내재 차원은 무엇인가?
RQ2아키텍처, 데이터셋, 강화학습 작업 간에 d_int90의 스케일은 어떻게 달라지는가?
RQ3더 큰 모델이 더 큰 중복성을 보이는가, 그리고 이것이 MDL 기반 모델 선택에 어떤 영향을 미치는가?
RQ4무작위 부분공간 학습이 큰 성능 저손실 없이 실용적인 네트워크 압축을 가져올 수 있는가?
RQ5감독 학습 과제와 RL 환경 간에 내재 차원이 어떻게 다른가?

주요 결과

내재 차원 d_int90은 종종 전체 매개변수 수 D보다 훨씬 작다(예: MNIST FC: D=199k, d_int90≈750; LeNet: D=44k, d_int90≈290).
모델 크기를 키우면 중복성 s가 증가하고, 넓은 D 범위에서 d_int90은 거의 변하지 않으며, 여분의 매개변수가 해결 가능성을 향상시키기보다는 해 공간을 확장한다는 것을 시사한다.
합성곱 신경망은 MNIST 및 CIFAR-10에서 FC 네트워크보다 매개변수 효율이 더 높을 수 있으며, 무작위 부분공간 학습은 상당한 압축을 제공한다(예: MNIST FC 압축 ~260x; LeNet ~150x).
RL 과제의 내재 차원은 과제에 따라 다르다(예: Inverted Pendulum: d_int90≈4; Humanoid: d_int90≈700; Pong: d_int90≈6000), 감독 학습 과제에 비견될 정도의 다양한 난이도를 나타냄.
내재 차원은 해에 대한 MDL의 상한을 제공하고, 학습 절차를 바꾸지 않으면서도 실용적인 엔드투엔드 압축 전략을 제시한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.