QUICK REVIEW

[論文レビュー] Calibration in Deep Learning: A Survey of the State-of-the-Art

Cheng Wang|arXiv (Cornell University)|Aug 2, 2023

Adversarial Robustness in Machine Learning被引用数 12

ひとこと要約

この調査は、最先端の深層学習キャリブレーション手法をレビューし、それらを4つのカテゴリに分類し、キャリブレーションの課題、指標、将来の方向性、LLMsを含む大規模モデルを含む。

ABSTRACT

Calibrating deep neural models plays an important role in building reliable, robust AI systems in safety-critical applications. Recent work has shown that modern neural networks that possess high predictive capability are poorly calibrated and produce unreliable model predictions. Though deep learning models achieve remarkable performance on various benchmarks, the study of model calibration and reliability is relatively under-explored. Ideal deep models should have not only high predictive performance but also be well calibrated. There have been some recent advances in calibrating deep models. In this survey, we review the state-of-the-art calibration methods and their principles for performing model calibration. First, we start with the definition of model calibration and explain the root causes of model miscalibration. Then we introduce the key metrics that can measure this aspect. It is followed by a summary of calibration methods that we roughly classify into four categories: post-hoc calibration, regularization methods, uncertainty estimation, and composition methods. We also cover recent advancements in calibrating large models, particularly large language models (LLMs). Finally, we discuss some open issues, challenges, and potential directions.

研究の動機と目的

モデルキャリブレーションを定義し、深層学習における不良キャリブレーションの根本原因を特定する。
最近のキャリブレーション手法とその基礎原理を調査・分類する。
キャリブレーション指標とそのバイアスを説明し、巨大モデルやLLMs への適用について論じる。
未解決の課題、挑戦、および将来の研究の潜在的方向性を強調する。

提案手法

キャリブレーション手法をポストホックキャリブレーション、正則化、不確実性推定、組成の4つに分類する。
過剰パラメータ化、過学習、過信をキャリブレーションと関連付けて原理を説明する。
ECE、MCE、CECE、AECE などの測定指標を要約し、信頼性ダイアグラムについて論じる。
ゼロショット・Few-shot 設定を含む大規模事前学習モデルおよび LLM のキャリブレーションについて論じる。
未解決の課題と将来の研究方向性について議論を提供する。

Figure 2: The methods of uncertainty estimation (?). (a) Bayesian neural network; (b) MC dropout; (c) Ensembles; (d) Gumbel-Softmax trick.

実験結果

リサーチクエスチョン

RQ1現代の深層ネットワークにおける不良キャリブレーションの原因は何であり、どのように緩和できるか？
RQ2さまざまな設定（ポストホック、正則化、不確実性、組成）における有効なキャリブレーション手法とそのトレードオフは何か？
RQ3キャリブレーション指標は信頼性をどう捉え、これらの指標にはどんなバイアスが存在するか？
RQ4ゼロショット／few-shot シナリオを含む大規模モデルおよび LLMs に対して、キャリブレーションを効果的に達成するにはどうすればよいか？

主な発見

キャリブレーション誤差は、過剰パラメータ化とトレーニングデータへの過学習により、モデルサイズと容量の増大と共に増加することが多い。
温度スケーリングなどのポストホックキャリブレーション手法とその拡張は、再学習せずにデータ効率の高いキャリブレーションを提供する。
正則化ベースおよび微分可能なキャリブレーション代理手法は、訓練中のキャリブレーションを改善できるが、計算コストは変動することがある。
データ拡張とミックスアップは、特に分布シフト下や未知データ（out-of-distribution）において、キャリブレーションと一般化を改善できる。
不確実性推定法（ベイズネット、MCドロップアウト、アンサンブル）は、不確実性の定量化とともにキャリブレーションを提供するが、複数の推論と高い計算量を要する。
LLMsやビジョン言語モデルを含む大規模モデルのキャリブレーションには独自の課題と機会があり、プロトタイプキャリブレーションやコンテキストキャリブレーションのような手法が有望を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。