QUICK REVIEW

[论文解读] Evaluating model calibration in classification

Juozas Vaicenavičius, David Widmann|arXiv (Cornell University)|Feb 19, 2019

Software Reliability and Analysis Research被引用 90

一句话总结

本文为评估概率分类器中的校准性建立了一个通用理论框架，并引入用于量化和可视化失校准的精炼方法，包括多维可靠性图。

ABSTRACT

Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications, it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that are consistent with the empirical frequencies observed from realized outcomes. A classifier with such a property is called calibrated. In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. Lastly, we propose new ways to quantify and visualize miscalibration in probabilistic classification, including novel multidimensional reliability diagrams.

研究动机与目标

在安全关键的分类任务中，验证已校准的概率估计的重要性。
建立一个以概率论为基础的一般概率校准评估框架。
识别现有校准评估技术中影响解读的细微差别。
提出新的度量与可视化工具，以量化并可视化失校准。

提出的方法

基于概率论建立一个概率校准评估框架。
分析现有校准度量和评估程序中的细微差别。
引入用于失校准的新颖可视化技术，包括多维可靠性图。

实验结果

研究问题

RQ1如何严格定义并评估概率分类器的校准性？
RQ2常见校准评估方法存在哪些细微之处，以及如何改进？
RQ3哪些新的度量和可视化工具能够有效量化并直观展示多分类设置中的失校准？

主要发现

提出了一个以概率理论为基础的校准评估理论框架。
识别出现有校准评估方法中的细微差别，从而使解读更为精炼。
引入了用于失校准的新量化和可视化方法，包括多维可靠性图。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。