QUICK REVIEW

[논문 리뷰] Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Michael W. Dusenberry, Ghassen Jerfel|arXiv (Cornell University)|2020. 05. 14.

Adversarial Robustness in Machine Learning참고 문헌 46인용 수 33

한 줄 요약

논문은 rank-1 Bayesian 신경망으로 혼합 posterior를 활용하여 불확실성 정량화 및 확장성에서 최첨단 성능을 달성하고, ImageNet, CIFAR, MIMIC-III에서 베이스라인을 능가하며 앙상블보다 훨씬 적은 파라미터를 사용합니다.

ABSTRACT

Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants.

연구 동기 및 목표

대규모 Bayesian 신경망에서의 과소적합 및 비효율성 해소.
파라미터 효율적인 접근으로 강력한 불확실성 정량화 달성.
확장 가능한 Bayesian 추론을 가능케 하는 rank-1 부분공간 매개화 활용.
최소한의 메모리 오버헤드로 여러 모드를 포착하기 위한 혼합 posterior 조사.

제안 방법

모든 가중치 행렬 W를 W' = W ∘ (r s^T)로 매개화(랭크-1 분해).
W를 결정적으로 취급하고 r과 s에 대해 변분 추론 수행(랭크-1 Bayesian 퍼터베이션).
구조화된 가중치 사전분포를 유도하고 희소성과 강건성을 가능하게 하기 위해 r과 s에 계층적 사전분포를 배치(예: Gaussian, Cauchy, inverse-Gamma).
랭크-1 인자에 대한 혼합 posterior를 사용하여 작은 메모리 오버헤드로 다중 모드를 포착(실험에서 혼합 크기 K=4 예시).
로그-혼합(likelihood) 대 평균-로그(likelihood) 학습을 비교하고, 분포 변화 하에서의 학습 다이나믹스와 일반화 분석.

실험 결과

연구 질문

RQ1랭크-1 가중치 퍼터베이션과 변분 추론의 조합이 규모에서 경쟁력 있는 정확도와 불확실성 보정성을 제공할 수 있는가?
RQ2랭크-1 인자에 대한 계층적 사전이 강건성과 분포 외 성능을 개선하는가?
RQ3혼합 구성요소 수와 후방 형태가 성능 및 매개변수 효율성에 어떤 영향을 주는가?
RQ4랭크-1 Bayesian 추론이 다양성, NLL, 보정에서 BatchEnsemble 및 딥 앙상블과 비교하여 어떠한가?
RQ5랭크-1 Bayesian nets에서 로그-혼합 가능도 bound가 학습이나 평가에 유리한가?

주요 결과

랭크-1 BNNs는 다모달 포스터리어를 사용하여 ImageNet, CIFAR, MIMIC-III 벤치마크에서 NLL, 정확도, 보정에서 최첨단 성능을 달성합니다.
랭크-1 인자에 대한 혼합 posterior는 많은 메모리 오버헤드 없이 상당한 이점을 제공합니다(예: K=10일 때 ResNet-50의 파라미터 증가 0.4%).
랭크-1 인자에 대한 Cauchy 사전은 Gaussian 사전에 비해 분포 변화 하에서 일반화와 불확실성 정량화를 개선합니다.
랭크-1 BNN은 BatchEnsemble보다 더 적은 파라미터를 사용하면서도 경쟁력 있는 딥 앙상블을 능가하고, 같은 정도의 정확도에서 더 높은 앙상블 다양성을 보입니다.
이론적 결과는 랭크-1 퍼터베이션이 완전연결 네트에서 전체 랭크 퍼터베이션의 국지 분산 구조와 일치할 수 있음을 보여 주며, 접근법의 표현력를 뒷받침합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.