QUICK REVIEW

[論文レビュー] How Good is the Bayes Posterior in Deep Neural Networks Really?

Florian Wenzel, Kevin A. Roth|arXiv (Cornell University)|Feb 6, 2020

Gaussian Processes and Bayesian Inference参考文献 78被引用数 33

ひとこと要約

この論文は、ベイズ後方予測が深層ネットの SGD に劣る可能性があること、そしてコールドポスター（T<1）がしばしば予測性能を著しく改善することを示している; それは説明を検討し、SG-MCMC の精度の診断を提供する。

ABSTRACT

During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

研究の動機と目的

深層ニューラルネットワークにおけるベイズ後方予測が、SGD ベースの予測と一致するか、もしくはそれを上回るかを評価する。
T<1 の温度付き（冷たい）ポスターが、ベイズ後方を超える予測性能を向上させ得ることを示す。
コールドポスターを説明する仮説を系統的に評価し、SG-MCMC の精度を診断する診断法を開発する。
深層学習において、ベイズ後方が有益である場合を理解するための実践的ガイドラインと診断法を提供する。

提案手法

Langevin ダイナミクスと SG-MCMC を用いて p(θ|D) を近似するための後方サンプリングを定式化する。
ResNet-20/CIFAR-10 および CNN-LSTM/IMDB タスクで、ベイズ後方の予測と SGD で訓練されたベースラインを比較する。
温度 T<1 で後方を暖和し、コールドポスターを作成し、最適な範囲を特定する（例: T<<1）。
SG-MCMC の精度を診断するための診断法を導入・適用し、運動温度（kinetic temperature）と配置温度（configurational temperature）を含める。
前置処方（プリコンディショニング）と循環的時刻刻みを用いて SG-MCMC のシミュレーション忠実度を向上させ、離散化ステップ h を SGD のパラメータと関連付ける。

実験結果

リサーチクエスチョン

RQ1標準的な深層学習ベンチマークで、T=1 のベイズ後方予測は SGD ベースのモデルと同等か、それより悪いか？
RQ2T<1 のコールドポスターは予測性能を向上させるか、最適な温度範囲はどこか？
RQ3コールドポスター現象を説明できる仮説はどれか、また推論の問題と事前分布や尤度の影響を区別する診断法はどれか？
RQ4SG-MCMC 法は意図した後方を正確に近似しているか、ミニバッチノイズや離散化などの要因が結果にどう影響するか？
RQ5事前分布やデータ実践（例: データ拡張、ドロップアウト）が深層ネットワークにおけるベイズ後方にどのように影響するか？

主な発見

T=1 のベイズ後方予測は ResNet-20/CIFAR-10 および CNN-LSTM/IMDB タスクで SGD を下回る。
温度 T<1 は予測性能を著しく向上させ、最適範囲は多くの場合 1 を大きく下回る（例: IMDB で 0.01–0.2、いくつかの実験で CIFAR-10 では 10^-4 まで下がる場合もある）。
前処理と循環的時刻刻みを用いた SG-MCMC は後方を正確にシミュレートでき、推論精度がコールドポスターの唯一の説明ではないことを支持する。
不適切な事前分布や尤度の違反によるバイアスだけでは、モデルやデータセットを横断してコールドポスター効果を完全に説明できない。
事前予測分析は、標準的な正規事前分布（例: N(0,I)）が大規模ネットワークでは過度に情報量が多い可能性を示し、事前選択が後方の挙動に影響を与えることを示唆する。
代替的な後方概念（例: Masegosa posteriors）は、ミススペシフィケーション下で Bayes 後方よりもより堅牢なターゲットを提供する可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。