QUICK REVIEW

[논문 리뷰] Where is the Information in a Deep Neural Network?

Alessandro Achille, Paolini, Giovanni|arXiv (Cornell University)|2019. 05. 29.

Stochastic Gradient Optimization Techniques참고 문헌 49인용 수 47

한 줄 요약

이 논문은 DNN의 Information in the Weights (IW)를 학습 손실과 가중치 코딩 길이 간의 균형으로 정의하고 분석하며, PAC-Bayes bound를 통해 IW를 일반화와 연결하고, 피셔 정보(Fisher information)를 이용해 활성화의 불변성과 가중치 정보 사이의 연결고리를 제시한다.

ABSTRACT

Whatever information a deep neural network has gleaned from training data is encoded in its weights. How this information affects the response of the network to future data remains largely an open question. Indeed, even defining and measuring information entails some subtleties, since a trained network is a deterministic map, so standard information measures can be degenerate. We measure information in a neural network via the optimal trade-off between accuracy of the response and complexity of the weights, measured by their coding length. Depending on the choice of code, the definition can reduce to standard measures such as Shannon Mutual Information and Fisher Information. However, the more general definition allows us to relate information to generalization and invariance, through a novel notion of effective information in the activations of a deep network. We establish a novel relation between the information in the weights and the effective information in the activations, and use this result to show that models with low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs. These relations hinge not only on the architecture of the model, but also on how it is trained, highlighting the complex inter-dependency between the class of functions implemented by deep neural networks, the loss function used for training them from finite data, and the inductive bias implicit in the optimization.

연구 동기 및 목표

Perturbation-induced loss changes와 training data에 대한 코딩 길이 간의 trade-off로 Information in the Weights를 정의한다.
정보를 가중치와 일반화 사이의 관계를 PAC-Bayes bound를 통해 연결한다.
활성화의 유효 정보(effective information) 개념을 도입·정형화하고 이를 가중치 정보와 연결한다.
Fisher Information과 Shannon information 간의 관계를 도출하고, 학습 dynamics가 이러한 양들을 어떻게 영향을 주는지 보여준다.
정보 척도가 아키텍처, 손실, 최적화에 의존함을 강조하고 실용적인 인코딩 선택에 대해 논의한다.

제안 방법

정보의 가중치(IW)를 가중치에 대한 사전분포 P와 사후분포 Q, 그리고 β로 조절되는 목표(L_D + β KL(Q||P))를 정의하여 최소화한다.
β=1일 때 IW 형식이 Bayesian neural networks에서 사용되는 ELBO로 축소되지만 Bayesian posterior를 필요로 하지 않는다를 보인다.
IW를 PAC-Bayes bound를 사용해 일반화와 연결하여 훈련 손실과 KL(Q||P)에 기반한 테스트 오차에 대한 경계를 도출한다.
IW를 Shannon information으로 특수화하여 사전 및 사후를 기대값에서 경계 최소화하도록 선택하면 I(w;D)로 얻어진다를 보여준다.
가우시안 인코딩 선택 하에서 IW를 Fisher information으로 축소시키고 KL 항을 작은 β 근사하에서 Fisher(Hessian)의 로그 결정정자와 연결한다.
Fisher information이 불변성에, Shannon information이 일반화에 작용하는 것을 보이며 stochastic optimization 하에서 두 척도 간의 1차적 연결을 논의한다.

실험 결과

연구 질문

RQ1대규모 DNN에서 네트워크 가중치에 보존된 학습 데이터에 대한 정보를 computable하게 어떻게 정량화할 수 있는가?
RQ2가중치의 정보와 활성화의 정보가 일반화 및 불변성과 관련하여 어떤 관계에 있는가?
RQ3다양한 정보 척도(Shannon vs Fisher)가 stochastic optimization 하에서 가중치-활성화 프레임워크 안에서 어떻게 관계하는가?
RQ4아키텍처 선택, 손실 함수, 최적화 다이나믹스가 학습된 표현의 정보 내용, 일반화, 불변성에 어떻게 공동으로 영향을 미치는가?
RQ5Information in the Weights를 통한 테스트 손실의 경계가 활성화 불변성과 어떻게 연관되어 유도될 수 있는가?

주요 결과

The Information in the Weights (IW) is defined as the KL divergence between a post-distribution over weights and a pre-distribution, penalized by the expected training loss.
IW bounds generalization through a PAC-Bayes bound on test loss, linking training behavior to performance on unseen data.
Under Gaussian encoding choices, IW reduces to Fisher information, connecting to curvature and stability of the learned solution.
Shannon information about the dataset can be recovered as the expected IW under an adapted prior, connecting IW to I(w;D).
Fisher information controls invariance to nuisances, while Shannon information controls generalization; SGD dynamics tend to couple these measures through flat minima and stability, tying optimization geometry to information content.
The framework shows a tight interdependence between network architecture, training loss, and optimization in determining what information is retained and how representations generalize.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.