QUICK REVIEW

[논문 리뷰] Deep Ensembles: A Loss Landscape Perspective

Stanislav Fort, Huiyi Hu|arXiv (Cornell University)|2019. 12. 05.

Generative Adversarial Networks and Image Synthesis참고 문헌 35인용 수 347

한 줄 요약

논문은 무작위 초기화가 서로 다른 함수-공간 모드를 탐색하는 반면, 단일 궤적 내의 부분공간 샘플링은 유사한 함수를 생성한다는 것을 보여주며; 무작위 앙상블은 다양성-정확도 트레이드오프에서 부분공간 방법을 능가한다.

ABSTRACT

Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. Developing the concept of the diversity--accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods. Finally, we evaluate the relative effects of ensembling, subspace based methods and ensembles of subspace based methods, and the experimental results validate our hypothesis.

연구 동기 및 목표

무작위 초기화로 형성된 깊은 앙상블이 왜 정확도와 불확실성 측면에서 좋은 성능을 보이는지 조사한다.
다양한 학습 궤적에서 함수의 다양성을 이해하기 위해 손실 지형(loss landscape)을 분석한다.
다양성 및 정확도 측면에서 무작위 초기화 앙상블과 부분공간 기반의 베이지안 근사치를 비교한다.
방법 간 데이터 세트 쉬프트에 대한 강건성과 다양성-정확도 트레이드오프를 검토한다.

제안 방법

다른 무작위 초기화로부터 다수의 신경망을 학습하여 앙상블을 형성한다.
체크포인트와 궤적 전반에서 가중치 공간과 함수 공간의 유사성을 분석한다.
각 궤적 주변에서 서브스페이스(랜덤 서브스페이스, 드롭아웃, 대각 가우시안, 저랭 가우시안)를 구성하고 비교한다.
예측 벡터에 대해 t-SNE를 사용하여 함수 공간의 다양성을 시각화한다.
CIFAR-10/100 및 ImageNet에서 노이즈/손상 및 OOD 데이터를 포함한 다양성-정확도 트레이드오프와 앙상블 성능을 평가한다.
CIFAR-10-C 및 ImageNet-C를 사용하여 데이터셋 쉬프트 하에서 앙상블 대 서브스페이스 방법을 평가한다.

실험 결과

연구 질문

RQ1무작위 초기화가 가중치 공간 궤적이 비슷하더라도 서로 다른 함수 공간 모드를 샘플링하는가?
RQ2서브스페이스 샘플링 방법이 독립적 앙상블과 비교하여 다양성과 정확도에서 어떤 차이를 보이는가?
RQ3특히 데이터셋 쉬프트 하에서 서브스페이스 기반 접근법이 앙상블에 보완적 이점을 제공할 수 있는가?
RQ4함수 공간의 다양성과 손상/OOD 입력에 대한 강건성 간의 관계는 무엇인가?

주요 결과

단일 궤적의 체크포인트는 가중치 공간과 함수 공간 모두에서 비슷하다.
다른 무작위 초기화로부터의 함수는 함수 공간에서 다양하지만 가중치 공간에서는 그렇지 않다.
서브스페이스 샘플링 방법은 함수 공간에서 원래 궤적에 가까운 함수를 생성하고 독립 최적해의 다양성에 이르지 못한다.
독립적으로 학습된 앙상블이 서브스페이스 방법보다 더 나은 다양성-정확도 트레이드오프를 달성하며, 앙상블 규모가 커질수록 이익이 증가한다.
앙상블과 서브스페이스 방법은 보완적이며, 특히 데이터셋 쉬프트(CIFAR-10-C, ImageNet-C)에서 성능과 불확실성 추정이 향상된다.
예측 간 젠슨-샤논 발산은 독립적 무작위 초기화에서 가장 크고, 궤적 내 서브스페이스에서는 손상 하에서 특히 더 작다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.