QUICK REVIEW

[论文解读] Revisiting the Calibration of Modern Neural Networks

Matthias Minderer, Josip Djolonga|arXiv (Cornell University)|Jun 15, 2021

Adversarial Robustness in Machine Learning参考文献 51被引用 69

一句话总结

本文重新评估最近图像分类器的预测不确定性校准，发现现代非卷积架构（如 ViT、MLP-Mixer）校准良好，且校准趋势更取决于架构而非模型大小或仅仅预训练，尤其在分布偏移下。

ABSTRACT

Accurate estimation of predictive uncertainty (model calibration) is essential for the safe application of neural networks. Many instances of miscalibration in modern neural networks have been reported, suggesting a trend that newer, more accurate models produce poorly calibrated predictions. Here, we revisit this question for recent state-of-the-art image classification models. We systematically relate model calibration and accuracy, and find that the most recent models, notably those not using convolutions, are among the best calibrated. Trends observed in prior model generations, such as decay of calibration with distribution shift or model size, are less pronounced in recent architectures. We also show that model size and amount of pretraining do not fully explain these differences, suggesting that architecture is a major determinant of calibration properties.

研究动机与目标

激发并重新评估在快速的架构进展下，最先进的图像分类器是否仍然保持良好的校准。
系统性地将校准与在不同模型家族和分布条件下的准确度关联起来。
识别除了规模和预训练数据之外影响校准特性的架构因素。
提供一个大规模数据集和代码，以便在模型和数据集之间进行广泛的校准评估。

提出的方法

比较在 ImageNet 规模任务上包括卷积和非卷积架构在内的广泛现代图像分类模型家族。
使用 100 个等质分箱的期望校准误差（ECE）来评估校准，包括可靠性图和替代指标（NLL、Brier 分数）。
对事后温度缩放进行应用，以将固有校准与置信度偏差分离，并评估其在不同模型家族中的效果。
在控制准确度的前提下，分析模型大小和预训练量/数据集对校准的影响。
使用 ImageNet-C 及其他分布外基准来评估在分布偏移下的校准，并检查跨数据集的一致性。

实验结果

研究问题

RQ1现代最先进的图像分类器是否仍然保持良好的校准，还是随着准确度的提高如前所述的那样校准会下降？
RQ2在进行温度缩放后，校准在不同模型家族（卷积与非卷积）之间如何变化？
RQ3模型大小和预训练数据在多大程度上解释架构之间的校准差异，特别是在分布偏移下？

主要发现

包括非卷积的 MLP-Mixer 和 Vision Transformers 在内的当前最佳模型，校准良好且对分布偏移具有鲁棒性，相较于早期模型。
同分布下的校准随着模型大小增加而略有下降，但被准确度提升所抵消。
在分布偏移下，校准随着模型大小增加而改善，颠倒了同分布时的趋势。
在分布偏移下，准确度与校准相关，表明优化准确度也可能有益于校准。
模型大小和预训练量本身并不能充分解释模型家族之间的校准差异；架构是一个主要决定因素。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。