Skip to main content
QUICK REVIEW

[論文レビュー] Understanding Softmax Confidence and Uncertainty

Tim Pearce, Alexandra Brintrup|arXiv (Cornell University)|Jun 9, 2021
Adversarial Robustness in Machine Learning参考文献 33被引用数 47
ひとこと要約

本論文は softmax の信頼度がエピステミック不確実性と相関する条件を分析し、softmax を不確実性に合わせて導く2つの暗黙のバイアスを特定し、主に最終層の特徴量の重なりではなく外挿の問題によって softmax が失敗することを示す診断実験を提供する。

ABSTRACT

It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution. Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this, e.g., out-of-distribution (OOD) detection. This paper investigates this contradiction, identifying two implicit biases that do encourage softmax confidence to correlate with epistemic uncertainty: 1) Approximately optimal decision boundary structure, and 2) Filtering effects of deep networks. It describes why low-dimensional intuitions about softmax confidence are misleading. Diagnostic experiments quantify reasons softmax confidence can fail, finding that extrapolations are less to blame than overlap between training and OOD data in final-layer representations. Pre-trained/fine-tuned networks reduce this overlap.

研究の動機と目的

  • Motivate why softmax confidence sometimes serves as a proxy for epistemic uncertainty in OOD detection.
  • Characterize uncertain regions of the softmax layer and the decision boundary structures.
  • Explain implicit biases enabling softmax to correlate with uncertainty: optimal boundary structure and deep networks filtering features.
  • Empirically diagnose failure modes of softmax-based uncertainty and assess mitigations via pre-training or fine-tuning.

提案手法

  • Analytical characterization of the softmax final layer and definition of a valid OOD region (Theorem 1, Def. 1).
  • Derivation of an approximately optimal decision boundary structure (Definition 2) and empirical evidence that trained networks approximate this structure (Figure 4).
  • Modeling final-layer activations as task-specific feature clusters with alignment to weight vectors (||z||, cos theta).
  • Use of Gaussian mixture density over final-layer activations to estimate in-distribution density and uncertainty (U_density).
  • Diagnostic experiments freezing softmax weights to test the impact of boundary structure on OOD detection (Figure 5).
  • Analysis of deep networks as filters that emphasize task-relevant features, reducing activation magnitudes for OOD inputs (Figure 6, Eq. 6).

実験結果

リサーチクエスチョン

  • RQ1Under what conditions does softmax confidence reliably indicate epistemic uncertainty for OOD detection?
  • RQ2How does the structure of the softmax decision boundary influence OOD detection performance?
  • RQ3What implicit biases in deep networks cause softmax confidence to correlate with epistemic uncertainty?
  • RQ4To what extent do final-layer feature representations filter or overlap OOD information, and how does pre-training affect this?
  • RQ5What are the primary causes of softmax failure in uncertainty estimation, and can pre-training mitigate them?

主な発見

  • Softmax confidence can correlate with epistemic uncertainty under two implicit biases: approximately optimal decision boundary structure and deep networks acting as filters for task-specific features.
  • The optimal boundary structure features equal-weight, zero-bias weight vectors evenly distributed so cos theta equals -1/(K-1) for all i ≠ j (empirically observed in trained networks, Figure 4).
  • The volume of the valid OOD region is larger under the optimal structure, improving OOD detection (theoretical corollaries and Figure 3).
  • Final-layer activations in OOD data tend to have smaller magnitudes and less familiar alignment with weight vectors, leading to reduced softmax confidence (Eq. 6; Figure 6).
  • Depth and pre-training help mitigate failure causes; pre-trained networks achieve near-perfect AUROC on OOD detection and largely avoid feature overlap (Table 1, described textually).
  • A simple mental model U_max mental captures that uncertainty rises with lower feature magnitude and less familiar angle to weight vectors (Eq. 7).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。