QUICK REVIEW

[論文レビュー] Deep Ensembles: A Loss Landscape Perspective

Stanislav Fort, Huiyi Hu|arXiv (Cornell University)|Dec 5, 2019

Generative Adversarial Networks and Image Synthesis参考文献 35被引用数 347

ひとこと要約

論文は、ランダム初期化が関数空間の異なるモードを探索する一方、単一の軌跡内のサブスペースサンプリングは類似した関数を生むことを示している。ランダムアンサンブルは多様性と精度のトレードオフでサブスペース法を上回る。

ABSTRACT

Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. Developing the concept of the diversity--accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods. Finally, we evaluate the relative effects of ensembling, subspace based methods and ensembles of subspace based methods, and the experimental results validate our hypothesis.

研究の動機と目的

ランダム初期化から形成された深層アンサンブルがなぜ精度と不確実性の点で良好に機能するのかを調査する。
ロスランドスケープを分析して、異なる訓練軌跡にわたる関数の多様性を理解する。
ランダム初期化アンサンブルとサブスペースベースのベイズ近似を多様性と精度の観点で比較する。
データセットシフト耐性と方法間の多様性–精度トレードオフを検討する。

提案手法

異なるランダム初期化から複数のニューラルネットワークを訓練してアンサンブルを形成する。
チェックポイントと軌跡の間のウェイト空間および関数空間の類似性を分析する。
各軌道の周辺でサブスペースを構築・比較する（ランダムサブスペース、ドロップアウト、対角ガウス、低秤ガウス）。
予測ベクトルにt-SNEを用いて関数空間の多様性を視覚化する。
CIFAR-10/100およびImageNetで、破損データとOODデータを含む多様性–精度トレードオフとアンサンブル性能を評価する。
CIFAR-10-CおよびImageNet-Cを用いてデータセットシフト下でのアンサンブルとサブスペース法を評価する。

実験結果

リサーチクエスチョン

RQ1ランダム初期化は、ウェイト空間の軌跡が類似していても、異なる関数空間モードをサンプルするのか？
RQ2サブスペースサンプリング法は、独立したアンサンブルと比較して多様性と精度でどうか？
RQ3データセットシフトの下で、特にサブスペースベースのアプローチはアンサンブルに補完的な利点を提供できるか？
RQ4関数空間の多様性と破損耐性やOOD入力への頑健性との関係は何か？

主な発見

単一の軌道に沿ったチェックポイントは、ウェイト空間と関数空間の両方で類似している。
異なるランダム初期化からの関数は関数空間では多様だが、ウェイト空間ではそうではない。
サブスペースサンプリング法は関数空間で起点軌道に近い関数を生み出し、独立最適解の多様性には達しない。
独立に訓練されたアンサンブルはサブスペース法より多様性–精度のトレードオフが良く、アンサンブルの利得はアンサンブルサイズとともに大きくなる。
アンサンブルとサブスペース法は補完的で、特にデータセットシフト下で性能と不確実性推定を改善する（CIFAR-10-C、ImageNet-C）。
予測間のJensen-Shannon分散は独立したランダム初期化で最大で、軌道内サブスペースでははるかに低い、特に破損時に。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。