QUICK REVIEW

[论文解读] Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning

Jize Zhang, Bhavya Kailkhura|arXiv (Cornell University)|Mar 16, 2020

Anomaly Detection Techniques and Applications被引用 47

一句话总结

这篇论文介绍 Mix-n-Match 校准策略，将集成与组合技术结合，以实现对深度分类器的后验校准的准确、数据高效且表达力强，同时提供一个数据高效的基于 KDE 的评估方法。

ABSTRACT

This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data-efficiency and expressive power while provably maintaining the classification accuracy of the original classifier. Mix-n-Match strategies are generic in the sense that they can be used to improve the performance of any off-the-shelf calibrator. We also reveal potential issues in standard evaluation practices. Popular approaches (e.g., histogram-based expected calibration error (ECE)) may provide misleading results especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency. Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks in most of the experimental settings. Our codes are available at https://github.com/zhang64-llnl/Mix-n-Match-Calibration.

研究动机与目标

为不确定性校准定义理想目标（保持准确性、数据高效、具有表达力）。
提出 Mix-n-Match 策略（集成与组合），在提高校准性能的同时保持准确性。
开发一种数据高效的核密度估计器（KDE），用于可靠的校准评估。
通过实证证明 Mix-n-Match 在跨数据集和模型的情况下优于最先进的校准方法。

提出的方法

引入基于对预测应用严格单调函数的保持准确性的校准映射。
提出参数化集合校准（Ensemble Temperature Scaling, ETS），在保持准确性和数据效率的同时提升表达力。
开发一个带有数据集成的非参数多类单调回归（IRM），以提高数据效率并保持准确性。
将参数化与非参数校准器以组合方式（IROvA-TS）结合，以充分利用两者的优势。
提供基于 KDE 的 ECE 估计量，用于可靠、数据高效的校准评估，具有渐近无偏性和一致性。
提供一个与维度无关的校准增益度量，用于稳健的方法排名。

实验结果

研究问题

RQ1校准方法是否能够在提高校准质量和数据效率的同时保持分类器的准确性？
RQ2如何设计集成和组合策略以在不牺牲准确性的前提下提升表达力？
RQ3数据高效的基于 KDE 的估计量在评估校准方面是否可靠，尤其是在小数据情形下？
RQ4混合参数化-非参数方法是否在常见基准测试中优于现有方法？

主要发现

Mix-n-Match 策略在跨多个数据集和模型架构中可证明地保持准确性，同时提升数据效率和表达力。
Ensemble Temperature Scaling (ETS) 相较于标准 Temperature Scaling (TS) 只需两个额外参数就能提升表达力，同时保持保持准确性的属性。
带数据集成的多类单调回归（IRM）在数据效率方面优于一对多的单调方法并保持准确性。
组合方法（IROvA-TS）将非参数校准与 TS 基线相结合，以实现保持准确性和改进的校准。
基于 KDE 的 ECE 估计器在样本较小时对直方图估计器具有更好的表现，并且在渐近上无偏且一致。
在 CIFAR-10/100 和 ImageNet 上的实验结果显示 Mix-n-Match 方法在校准增益方面优于基线，且准确性相当或更高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。